Bug #199468 “highlighting (review tool) transpass columns in dou...” : Bugs : okular package : Ubuntu

Revision history for this message

Everthon Valadão (valadao) wrote on 2008-03-07:

#1

highlight problem sample (observe the erroneous highlighting of the first column) Edit (90.5 KiB, image/png)

Revision history for this message

In KDE Bug Tracking System #161324, Alvaro-aguilera (alvaro-aguilera) wrote on 2008-04-27:

#7

Version: 0.6.3 (using 4.0.3 (KDE 4.0.3) "release 19.2", compiled sources)
Compiler: gcc
OS: Linux (x86_64) release 2.6.22.17-0.1-default

I find Okular's yellow highlighter very useful, something missing however, is the ability to recognize a layout with columns. Almost every PDF I read has such format and Okular forces me to highlight line by line, instead of allowing me to mark the whole paragraph.

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2008-04-27:

#8

Give a better title, as it's a general "problem".

Revision history for this message

In KDE Bug Tracking System #161324, Kde2eran (kde2eran) wrote on 2008-04-28:

#9

In the general case this seems to require a layout analysis, such as OCRopus.

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2008-05-31:

#10

*** Bug 162957 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, Bui Arantsson (bui-foss) wrote on 2008-06-24:

#11

This feature indeed needs to be fixed, if okular's highlighter tool is to become useful for scientific work, seeing as almost all journals use text-layouts with columns. However, I am not a programmer, and thus have no idea how it should be implemented, and whether analysis of pdf layouts is easy or not. If not, another possibility might be to allow the user to subdivide documents himself. I.e to allow the user to "draw" borders to which the highlighter will limit itself. Almost like setting margins in a word editor, although of course purely for internal use.

Revision history for this message

In KDE Bug Tracking System #161324, Jospoortvliet (jospoortvliet) wrote on 2008-07-16:

#12

I just bumped into this when trying to do a screencast about Okular (for a upcoming KDE promo site). I must say it is rather unfortunate, and I don't think I will demo this feature as it is - it will only make ppl feel betrayed if they find out it doesn't work as it should. This is no stab at you guys developing this app - Okular is way cool. It's just that this issue somehow has to be solved. I have no idea if this even works properly in for example Adobe acrobat reader, or any other app - I suspect this is pretty hard to do, given the little I know about layout stuff in PDF's. Pitty...

Anyway. I hope this can be solved someday - somehow. Meanwhile, keep up the good work. Okular is really nice, but still has many small issues...

I do wanna say the selection mechanism you guys made (right mouseclick - select an area - copy text/picture) works SOO GOOD :D

Bug Watch Updater (bug-watch-updater) on 2008-08-07

Changed in okular:
status:	Unknown → New

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2008-08-30:

#13

*** Bug 170102 has been marked as a duplicate of this bug. ***

Revision history for this message

Jonathan Thomas (echidnaman) wrote on 2008-09-29:

#2

Okular lives in kdegraphics from Intrepid on.

Changed in okular:
status:	New → Triaged

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2008-11-17:

#14

*** Bug 175377 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, James-rivett-carnac (james-rivett-carnac) wrote on 2008-11-17:

#15

*** This bug has been confirmed by popular vote. ***

Bug Watch Updater (bug-watch-updater) on 2008-11-17

Changed in okular:
status:	New → Confirmed

Revision history for this message

In KDE Bug Tracking System #161324, Michal Witkowski (neuro-o2) wrote on 2009-01-09:

#16

The same can be said about the text select tool. It highlights text from both columns.

Is there any hope that this might get resolved any time soon?

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2009-01-10:

#17

Pattern recognizion of what is a column and what is not based on coordinates of each character is something your brain can do very easily but programming an algorithm that does that is not trivial by far, so i guess the answer is no

Revision history for this message

In KDE Bug Tracking System #161324, Michal Witkowski (neuro-o2) wrote on 2009-01-10:

#18

Well, the thing is that both Adobe Reader and Foxit Reader are able to detect columns just fine (text selection, text highlight) so it's possible for sure. Maybe Okular's PDF backend is limited and doesn't provide text-layout information and that's what makes it hard. But saying that it's a hard problem solvable to a computer is just not true.

Revision history for this message

In KDE Bug Tracking System #161324, Michal Witkowski (neuro-o2) wrote on 2009-01-10:

#19

Just as I thought, it's a poppler bug. A similar problem is seen in evince (gnome pdf viewer)

https://bugs.launchpad.net/poppler/+bug/33288

Revision history for this message

In KDE Bug Tracking System #161324, Robert Knight (robertknight) wrote on 2009-01-10:

#20

> Pattern recognizion of what is a column and what is not based on
> coordinates of each character is something your brain can do very easily
> but programming an algorithm that does that is not trivial by far,
> so i guess the answer is no

It is certainly possible but not trivial - Ocropus provides a free software C++ implementation of algorithms to do this if you're interested. The basic approach is to try to the largest columns of whitespace in the page and divide the text into columns based on that.

Revision history for this message

In KDE Bug Tracking System #161324, Robert Knight (robertknight) wrote on 2009-01-10:

#21

> The basic approach is to try to the largest columns of whitespace
> in the page and divide the text into columns based on that.

Sorry, that should read:

The basic approach seems to be finding the largest columns of whitespace in the page and dividing the text into columns based on that.

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2009-01-10:

#22

<quote>
Maybe Okular's PDF backend is limited and doesn't provide text-layout
information and that's what makes it hard.
</quote>

I like when people speak if they knew how PDF works. Please if you know that PDF provides text-layout go to the poppler project (which by coincidence i am the maintainer of) and send a patch.

<quote>
But saying that it's a hard problem solvable to a computer is just not true.
</quote>

I also like when people decides that something is not hard because someone else is able of doing it. What about painting the Mona Lisa, it should not be that difficult, someone did it 500 years ago! How you dare to say that painting it is something difficult!

Revision history for this message

In KDE Bug Tracking System #161324, Isaac Puch Rojo (puchrojo) wrote on 2009-02-27:

#23

(In reply to comment #15)

OK, The discussion could be more diplomatic and the comment are not very constructive. But if you want to work with scientific paper, this bug is very important.

I only want to ask, if the Okular Team want to work in this problem or no. I would respect that they don't want.

Thanks for the great Program!

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2009-05-26:

#24

*** Bug 194120 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, acrocephalus (dani-valverde) wrote on 2009-09-10:

#25

What about adding a text highlight tool (just as the selection tool, but to highlight instead of selecting)? It may be an easier solution while looking for a fancier way ...

Revision history for this message

In KDE Bug Tracking System #161324, Isaac Puch Rojo (puchrojo) wrote on 2009-09-10:

#26

The solution from Dani Valverde is not perfect, but it will be work.
I give my virtually vote ;-)

By, Isaac

Revision history for this message

In KDE Bug Tracking System #161324, Chosunsk (chosunsk) wrote on 2009-09-13:

#27

This bug is over three years old :(

https://launchpad.net/ubuntu/+source/poppler/+bug/33288

https://bugs.freedesktop.org/show_bug.cgi?id=3188

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2009-09-13:

#28

Three years don't make it easier to solve, we still welcome people with knowledge on how to fix it.

Revision history for this message

Krister Swenson (thekswenson) wrote on 2009-11-03:

#3

I agree that this is very annoying.

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2010-02-02:

#29

*** Bug 225267 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, Ekin-0 (ekin-0) wrote on 2010-02-02:

#30

I agree that three years do not make it easier to solve but it definitely makes it a must feature that needs to be implemented. By the way, what do developers do between the releases apart from fixing bugs?

Revision history for this message

In KDE Bug Tracking System #161324, Michal Witkowski (neuro-o2) wrote on 2010-02-02:

#31

From: https://bugs.freedesktop.org/show_bug.cgi?id=3188

"Comment #45 From Praveen Thirukonda 2009-12-27 00:42:00 PST -------

it seems this bug now has a working patch and yet there has not been any
activity for the past few weeks.
It would really be great if this is committed soon as this is a really annoying
bug for many. "

It seems that there's hope :)

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2010-02-02:

#32

#23: What do we do? Well, personally i sleep 7 hours a day, work 8 hours a day, spend 2 eating and preparing things to eat, 1 travelling to and from work, 1 going to shop things to eat and the rest of the 3 hours i try to code things for KDE, but then some user demands to know what i do with my life and that 3 hours become 2.5 hours. You should be happy i have no friends, otherwise that 2.5 hours would be a 0

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2010-02-02:

#33

#24 This is not going to help okular at all since we do not use poppler text algorithms since we support text selection for more formats than just PDF

Revision history for this message

In KDE Bug Tracking System #161324, Ekin-0 (ekin-0) wrote on 2010-02-04:

#34

I did not mean to be rude when making above statement. I really appreciate KDE and its applications in terms of the approach they have taken, i.e. abundant configurability and capability of the application. If only Okular had this feature.

Revision history for this message

claudio@ubuntu (claudio.ubuntu) wrote on 2010-03-11:

#4

This bug also applies to text selection (Tools - Text Selection Tool).

Revision history for this message

In KDE Bug Tracking System #161324, Luigi Toscano (ltosky) wrote on 2010-04-27:

#35

*** Bug 235531 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, yuval aviel (yuval-aviel) wrote on 2010-08-05:

#36

(In reply to comment #26)
> #24 This is not going to help okular at all since we do not use poppler text
> algorithms since we support text selection for more formats than just PDF

I guess that 90% of Okular users that also use annotation, use it for reading PDF files.

Maybe solving this issue with Poppler solution is not such a bad way to go.

Revision history for this message

In KDE Bug Tracking System #161324, Chosunsk (chosunsk) wrote on 2010-08-20:

#37

(In reply to comment #29)
> (In reply to comment #26)
> > #24 This is not going to help okular at all since we do not use poppler text
> > algorithms since we support text selection for more formats than just PDF
>
> I guess that 90% of Okular users that also use annotation, use it for reading
> PDF files.
>
> Maybe solving this issue with Poppler solution is not such a bad way to go.

Indeed, evince, which uses poppler algorithms, supports column selection.

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2010-08-20:

#38

Indeed, evince does not support text selection in the horde of document formats that Okular does, our selection might be better or worse but it is [mostly] consisten among document formats.

But you don't really care, you like bashing developers because you think that will make them realize that you are right.

Revision history for this message

In KDE Bug Tracking System #161324, Peter Hedlund (peter-peterandlinda) wrote on 2010-09-17:

#39

(In reply to comment #31)
> Indeed, evince does not support text selection in the horde of document formats
> that Okular does, our selection might be better or worse but it is [mostly]
> consisten among document formats.
>
> But you don't really care, you like bashing developers because you think that
> will make them realize that you are right.

Albert, relax. But still, for many users Okular = pdf and for many users pdf = two-column scientific papers. Okular uses the poppler backend for pdf and if the backend now supports column selection, so should Okular.

I am sure there are already some if... then to handle all the formats you say Okular supports. Please consider making it a priority to add use of the poppler backend if the format where selection is happening is pdf.

Thanks,
Peter

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2010-09-17:

#40

Let me tell you a secret: i don't need advanced text selection in okular, so obviously it's not my priority.

Now let me tell you another secret: Okular is free software! So all you that need advanced text selection are very welcome to improve okular text selection algorithm send a patch and then not only pdf text selection would be better but all the other formats too! For free!

Revision history for this message

In KDE Bug Tracking System #161324, Peter Hedlund (peter-peterandlinda) wrote on 2010-09-17:

#41

(In reply to comment #33)
> Let me tell you a secret: i don't need advanced text selection in okular, so
> obviously it's not my priority.
>
> Now let me tell you another secret: Okular is free software! So all you that
> need advanced text selection are very welcome to improve okular text selection
> algorithm send a patch and then not only pdf text selection would be better but
> all the other formats too! For free!

Here is my secret: I have never needed anything in the program I maintain (KWordQuiz), but I think it is fun when people show interest and tell we about features they would like. If they are reasonable I see it as a challenge to my limited self-taught programming skills to try to implement them. That why I use free software.

I have actually looked in to pdf developement as I had some interest in page manipulation features like adding and removing (pages) so I know it is no walk in the park. Still it seems someone has already done a significant part of the work in this (selection) case. Now is the time to step up to the final challenge or is programming not fun anymore?

Well, back to Adobe Reader...

Revision history for this message

In KDE Bug Tracking System #161324, Kde2eran (kde2eran) wrote on 2010-09-17:

#42

Does poppler guess the text layout using some generic heuristic algorithm, or use some explicit information on text ordering embedded in the PDF format? If it's the latter, then Okular ought to use that embedded information, via poppler, instead of discarding it and taking a wild guess instead.

Revision history for this message

In KDE Bug Tracking System #161324, Alvaro-aguilera (alvaro-aguilera) wrote on 2010-09-17:

#43

I like the idea of supporting the multiple file formats but I guess that 99% of the people (myself included) use Okular exclusively as a PDF reader. It's a pity that the formant independence gets in the way of implementing features that would be actually useful for the majority of its users. I'd bet that if someone revamps KPDF people would make the switch from one day to the other.

Revision history for this message

In KDE Bug Tracking System #161324, Robert Knight (robertknight) wrote on 2010-09-17:

#44

> Does poppler guess the text layout using some generic heuristic algorithm, or
> use some explicit information on text ordering embedded in the PDF format?

PDFs do not contain layout information about how text is structured into paragraphs and columns. As I understand it, what PDF provides is essentially a list of commands that say "draw string S at position P with font F".

I haven't looked into recent versions of Poppler but older versions had some fairly complex heuristic algorithms to try to piece together the layout given the input. These algorithms had some interesting flaws. If I remember correctly, due to numerical instability the order of paragraphs in the output text could differ significantly depending on the processor on which you ran the code.

Revision history for this message

In KDE Bug Tracking System #161324, Uetsah (uetsah) wrote on 2010-09-17:

#45

(In reply to comment #31)
> Indeed, evince does not support text selection in the horde of document formats
> that Okular does, our selection might be better or worse but it is [mostly]
> consisten among document formats.

Supporting multiple document formats consistently is great, but won't it be possible to still allow certain features to only be supported by some document formats and not others? Or to be implemented differently for each backend, where it makes sense?

The text selection user interface could still stay the same for every format, but in the background it could use whatever algorithms each respective backend provides for reading or guessing text layout structure.

So in case of PDF documents, the backend would use Poppler's heuristic algorithms. In case of OpenDocument documents, the backend would use the structural information already available in the document file. And so on...

Of course there could also be a generic algorithm that guesses the text structure independently of the document format, but as I understand it, that would be much more work...

Btw, I personally think that even with this feature missing, Okular is still best PDF viewer out there, so thanks for the great work and for giving it away for free... :-)
If this feature is not on your priority list, that's of course totally fine, but please maybe still consider it for the future, just in case one day you're bored and don't know anything else to implement... ;-)

Revision history for this message

In KDE Bug Tracking System #161324, yves hennequin (yves-hennequin) wrote on 2010-11-01:

#46

hi
sorry not sure this is the right place for me to comment..
As a user of Okular I would also benefit from the double column recognition for annotations, etc...
My work around is to cover text with an inline comment without text and lower the opacity or put an ellipse and change it into a rectangle.
With this in mind I would then also be happy if I could set the parameters of the annotations (opacity, collors etc...) once for all so that when I place a new one, it has already the look I want.
I have no idea how difficult that is to do...
cheers
y.

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2010-11-01:

#47

yves, you are asking for something (I would be happy if I could set the parameters of the annotations) that has nothing to do with this bug. Please open a separate with issue.

Bug Watch Updater (bug-watch-updater) on 2011-02-27

Changed in okular:
importance:	Unknown → Wishlist

Revision history for this message

In KDE Bug Tracking System #161324, Pino Toscano (pinotree) wrote on 2011-03-13:

#48

*** Bug 268334 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2011-06-27:

#49

*** Bug 276580 has been marked as a duplicate of this bug. ***

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2011-08-28:

#50

This has been implemented in this year GSoC and will be available in Okular as of KDE 4.8

You can find more info at http://tsdgeos.blogspot.com/2011/08/okular-selection-gsoc-in-depth-analysis.html

You are all encouraged to give a try to the current git master code (if you know how to compile) and give back constructive feedback.

Bug Watch Updater (bug-watch-updater) on 2011-08-29

Changed in okular-old:
status:	Confirmed → Fix Released

Revision history for this message

Maarten Bezemer (veger) wrote on 2012-03-29:

#5

Since oneiric okular has its own package

affects:

kdegraphics (Ubuntu) → okular (Ubuntu)

Revision history for this message

Maarten Bezemer (veger) wrote on 2012-03-29:

#6

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner.
According to the upstream report, this issue should have been fixed since KDE 4.8, so Precise should include this fix as well.
It would help us a lot if you could test it on a currently supported Ubuntu version. When you test it and it is still an issue, kindly upload the updated logs by running apport-collect 199468 and any other logs that are relevant for this particular issue.

Changed in okular (Ubuntu):
status:	Triaged → Incomplete

Revision history for this message

In KDE Bug Tracking System #161324, Champignoom (champignoom) wrote on 2024-02-26:

#51

I'm reading a two-column pdf (https://dl.acm.org/doi/pdf/10.1145/3477113.3487272), for which the selection still doesn't work properly.

Okular Version: 23.08.5
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.12

Is there any chance to further improve the column recognition algorithm?

Bug Watch Updater (bug-watch-updater) on 2024-02-26

Changed in kdegraphics:
importance:	Unknown → Wishlist
status:	Unknown → Confirmed
Changed in okular-old:
status:	Fix Released → Confirmed

Revision history for this message

In KDE Bug Tracking System #161324, Albert Astals Cid (aacid) wrote on 2024-02-26:

#52

I am going to close this, please open a new bug.

This is has been marked as fixed for 13 years old and has more than 20 users that get notified when things change here, and my guess is that they really don't want to be bothered about this particular PDF that fails, because for them it works, if it didn't work, they would have reopened this bug shortly in these 13 years that the bug was marked as fixed.

Bug Watch Updater (bug-watch-updater) on 2024-02-27

Changed in kdegraphics:
status:	Confirmed → Fix Released
Changed in okular-old:
status:	Confirmed → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	KDE Graphics	Fix Released	Wishlist	kde-bugs #161324
	okular (Ubuntu)	Incomplete	Undecided	Unassigned

Ubuntu
okular package

highlighting (review tool) transpass columns in double column pdf's

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntuokular package

highlighting (review tool) transpass columns in double column pdf's

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
okular package