Romanian spell-check dictionary uses incorect diacritics

Bug #214193 reported by Bogdan Butnaru
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
myspell-ro (Ubuntu)
Fix Released
Low
Ubuntu Romanian Quality Assurance
Nominated for Hardy by Bogdan Butnaru

Bug Description

Binary package hint: myspell-ro

Hello! The dictionary for Romanian spell-checking (in all versions of Ubuntu, up to and including Hardy) contains the words encoded in ISO8859-2. As a consequence, Romanian characters ș/Ș and ț/Ț (s and t with comma below) are represented in the -with-cedilla version.

On one hand, this is wrong, because the Romanian language doesn't actually contain any character with cedilla. On the other hand, the -with-comma-below characters aren't very well supported by fonts and keyboard layouts, so it'd be wrong to force people to use them.

The attached debdiff solves the problem: It adds a new myspell-ro-comma package that contains the correct diacritics.

The dictionary is generated automatically at build time from the “normal” package, so the vocabulary is exactly the same for the two versions. The -comma version is marked as replacing&providing the normal version of the package; users who choose to use the correct versions of the characters can install the -comma version manually, and they can always remove the other version.

(It might be interesting to have both dictionaries installed. But (1) I don't know how to add multiple dictionaries of the same language and (2) it could get confusing.)

Tags: patch
Revision history for this message
Bogdan Butnaru (bogdanb) wrote :
Revision history for this message
Bogdan Butnaru (bogdanb) wrote :

Note that this fix is compatible (actually, orthogonal) to that in bug #213990 — if the dictionary is updated it will apply of course to both package versions. However, both debdiffs contain the addition of a simple patch system, so only one should be used; for the other it's enough to copy the patch in debian/patches.

Daniel T Chen (crimsun)
Changed in myspell-ro:
importance: Undecided → Low
status: New → Triaged
Adi Roiban (adiroiban)
Changed in myspell-ro:
assignee: nobody → ubuntu-l10n-ro
Revision history for this message
rs (struct-bylighting) wrote :

The dictionary distributed by Ubuntu is at least two years behind. Move to Fedora, or install OpenOffice.org and Firefox on Windows XP.

Revision history for this message
Jani Monoses (jani) wrote :

rs (Lucian Constantin):
you are very constructive as always. If you have no solution to offer besides childish remarks go and troll somewhere else.

Revision history for this message
rs (struct-bylighting) wrote :

A solution has been available for at least one year from aspell.net. Another one is distributed in the Romanian Firefox build from mozilla.org. You can also get a version of the dictionary from OpenOffice.org website. Ubuntu developer Jani Monose was informed about this solution a long time ago, however his ego wouldn't let him act on it.

Regarding your response. Is this what you need to expect from an Ubuntu developer? Are you supposed to be called a troll? Is your real name required to appear without your consent in this context. Is this a scandal webpage?

Revision history for this message
Lucian Adrian Grijincu (lucian.grijincu) wrote :

Break it up or take it outside :)

Jani: no reason to root Lucian out, we who know him from other groups, know his email address.
rs: email me in private with details (I really have no idea about myspell-ro/aspell/hunspell or other *spells) and I'll try to submit an updated version.

Revision history for this message
Jani Monoses (jani) wrote :

Lucian, something just being available is not a solution in itself. It needs a normal version number to allow easy upgrade of the existing one (last time I looked it did not have one), makes sure it does not break anything - the current dictionaries use comma not cedilla, etc.

It is not my ego that prevented me acting on it, but the lack of assurance that the new dictionary does not have any regressions and does not cause trouble for unprepared users - I do not care about what the academic standard is, if it still breaks peoples documents who interact with Windows, it is a bug and I will not endorse such a change. Or feel free to get it into Debian first, then it will come to Ubuntu automatically. But spamming this bugreport and suggesting people use XP is not leading us anywhere.

And yes your real name better appear anywhere you post, if you are willing to make statements, especially harsh ones as yours. Be prepared to take responsibility for your words and let Google record it for posterity. Alternately refrain from saying things you think you would not say in person to someone. And I called you a troll not because you use a nickname, but because you, well, trolled, here and elsewhere and your attitude does not help your cause a bit.

Start a constructive discussion about the subject in one of the romanian mailing lists and then there may be progress if you find people willing to help. State how it is better than the existing version of the dictionary and what the drawback are and you are off to a good start. If you just complain that the ucrrent package is old and are not willing to discuss details the situation will go on for a while just as it has in the past two years.

Revision history for this message
rs (struct-bylighting) wrote :

Also take a look at bugs 240048 and 240050. According to the submission dates, Ubuntu developers have been informed a log time ago regarding solutions to this specific problem.

The competitive picture in this moment is very bad for Ubuntu. Firefox and OpenOffice.org are distributing correct dictionaries for Windows users. All other major Linux distributions have correct and up to date dictionaries. Also, on Fedora website they don't call you a troll when they disagree with you.

Revision history for this message
rs (struct-bylighting) wrote :

I find Jani Monose's comments patronizing as always, no need to answer!

Lucian Adrian Grijincu wrote:
> email me in private with details

As I said, bugs 240048 and 240050 have all the necessary information. All that needs to be done is to get the official dictionaries from aspell.net and openoffice.org. In openoffice.org the latest version is the one in the "contemporary" extension - "classic" extension has dictionaries with i din a. Another option is to go in cvs at mozilla.org.

Revision history for this message
Alexandru Szasz (alexxed) wrote :

Guys, this seems like a thing to do not to talk about, what's blocking this ?

Revision history for this message
Adi Roiban (adiroiban) wrote :

The only problem is the versioning number?

Switching to Fedora or Win XP is not a proper solution for this problem. We all love the restricred driver manager and we are willing to do all kind of weird sacrifices to have it running on our computers ;)

Can we just make a "fork" and create a compatible version number? We could write the version equivalances in the README.

Also maybe we could add both ro-comma (for comma below) and ro-cedilla (for cedilla).

We can decide the default version. Having an updated version and also having both comma and cedilla is a HUGE step forward :)

Having 2 version could be confising but this job should be solved by the documentation team :D

Does the dictionary contain any turkish names? :D We can use a simple script to convert from one version to another.

Jani do you think we can have all this in Jaunty ?

I don't think that a rebel versioning scheme is a good reason to not update the dictionary, at least.

I am keen to see the dictionary updated and give RS a reason to embrace Ubuntu with love ;)

Revision history for this message
Jani Monoses (jani) wrote :

Adi, it is exactly the problems of the versioning scheme and of the uncertainty of how two separate dictionaries work is why I do not consider that 'a solution has been available'. It is not just a matter of uploading a package. Please discuss it on the mailing list, split the problem in separate stages 1) larger word count 2) commas 3) whatever else. Convince Lucian to adopt a sane versioning scheme so we do not have to waste time on forking and other absolutely gratuitous silliness.

These are the reasons I have been keeping away from this issue. If anyone has time to dedicate to it, be my guests :)

If there is a new package with only two changes namely 1) a version that uses the much more suggestive and unambiguous 20090215 style and *only* adds more words without changing comma/cedillas I will upload ASAP.

To fix the other issues keep in mind that OO does not support two dictionaries for the same language and that most users have no clue what commas and cedillas are, but they care a lot about their documents reaching the destination readable.

Revision history for this message
rs (struct-bylighting) wrote :

> Adi, it is exactly the problems of the versioning scheme and of the uncertainty
> of how two separate dictionaries work is why I do not consider that 'a solution
> has been available'

What dictionaries? Why are using a plural form? Are there more than one?

If I say Fedora once more, I'll be repeating myself. I'll try FreeBSD instead. I don't remember FreeBSD developers ever questioning the versioning of aspell.net and openoffice.org dictionaries. I've never seen them pretending to know better what version somebody else program should have. And they didn't have Miss Manners among them either...

Thank you for educating me, I'll move along!

Revision history for this message
Adi Roiban (adiroiban) wrote :

As can not tell if the version number is good or bad but as I can see the aspell-ro was updated in Jaunty to the latest version.

I am confident we could find a solution for this problem.

I will try to gather more knowledge about spellcheckers and how to work and try to come with a solution.
I hope Jani will be sponsoring the changes.

Everyone is free to chose a versioning scheme of it's will. This is why I have no intention to convince anyone of using a specific versioning scheme.

Revision history for this message
Jani Monoses (jani) wrote :

Lucian: yes there are two dictionaries, one with comma the other with cedilla

Adi: there existed a mysepll-ro (not talking about aspell) on the openoffice l10n site. It used the 2005xxxx versioning and was made made by Secarica AFAICR . It was in gentoo. I added it to Ubuntu in 2007. Then Lucian came up with another package called myspell-ro but with another versioning scheme.
So everyone is free to choose a versioning scheme of his own will, and support the consequences of his choice.
This is my last take ono the subject ( I have rehashed it every time it came up in the past two years) , look at my proposal above to move forward.

Revision history for this message
Alexandru Szasz (alexxed) wrote :

Guys, users don't care about versioning, they just need the dictionary with the most words and the correct writing scheme. There is no sense in distributing the wrong cedilla version, as you all know there is a law that says we should use the correct comma version in official writings, so I see no sense in giving Ubuntu users a headache. The X.org comes with a comma keymap by default. "Readable" is not an excuse because if the user is typing commas he won't be able to search the document. "Usable" is more important. As you *need* to install software to view documents you can install fonts to view documents. Changing a keymap is not that simple and distributing a spell-checker that marks correct incorrect writing is very wrong.

So Jani, could you repeat your solution with comma involved ? IMHO it doesn't matter what versioning scheme you choose to use as long as you're comitted to maintain the package.

Revision history for this message
Jani Monoses (jani) wrote :

Let's separate the issues:

1)users do not care about versions but that is no explanation or excuse for gratuitously picking a version that is inferior to the one used with a package with the same name. Developers and packagers care about versions because that is how things work sanely. It is a basic requirement from every upstream nothing particular to myspell-ro. Play nice with distros. The fact that Fedora just recently added myspell-ro means it had not had to figure out an upgrade path from a larger version umber to a smaller one. If users do not care about version numbers please adopt a new one for myspell-ro and we are all fine. that will help gentoo as well which is stuil stuck in 2006 as far as I can see when it comes to myspell-ro.
Lucian have you bugged them about it?

2) Comma. The fact that X.org adopted commas is by itslef no argument. It weas done in the same way without actually or necessarily evaluating what is safest for users. The fact that there is a law is not really helping either a there are laws for many things, some realistic some not. If we cannot make sure that a document written by a newbie is not getting across as junk to an unsuspecting XP user - over 60% of the market I am not comfortable with breaking the existing, less optimal but basically interoperable support. I want this mess cleared just as you do, but I do not think that the techincal solution on forcing it upon users is the best way. That is the easiest for _us_ developers, it gets the problem off our chest and saves us some trouble and time but it is basically throwing the responsibility to the users, who 'should know better' or 'learn' or 'be educated about it' . That is a pipe dream IMHO. Just as you do not get people to use firefox until they hear about it everywhere you won't get them use standard spelling just because you hope so or because it is a good idea.

As far as I am concerned I am updating the package with the extended word set if the version number is fixed. That, even if not what you want is clearly an improvement over the smaller dictionary, with _no_ regression. The rest of the issue we can talk about separately and until we get an agreement, if we care about the same things that is :)

Revision history for this message
rs (struct-bylighting) wrote :

The release numbering is imposed by aspell.net in the format X.Y for major releases and X.Y.Z for minor releases. It is a pretty standard numbering scheme, most projects are using it. Based on this version, a number of Unix and Linux distributions are driving automatic updates. Unfortunately, we cannot change it now without breaking everything. Also, Fedora depends on this type of versions for automatic updates (they grab the dictionaries directly from us, without going trough aspell.net).

OpenOffice.org is moving away from the release format based on date. The extensions are already using X.Y.Z format. You might still find dictionaries in the old format on

http://wiki.services.openoffice.org/wiki/Dictionaries

however, they tend to be old dictionaries. The main update mechanism OpenOffice.org today is the extension mechanism.

This is something that might work for you. We have a version X.Y.Z on the first line of ro_RO.aff file. We use this number for customer support and to track GPL compliance. Given the version from ro_RO.aff file, it is very easy for us to say this file was generated from this specific svn version. I can easily add in the same file a date when the the dictionary was created. You can use it to drive your release numbering.

I guess the most important thing is to be able to reproduce a specific user dictionary from svn and to fix whatever problems are reported. This is why we keep the version in ro_RO.aff file under svn control. How the various distribution packages are numbered is irrelevant, all the support requests are still coming to us.

Revision history for this message
Lucian Adrian Grijincu (lucian.grijincu) wrote :

Closing bug: from Lucid onwards the dictionary is provided by hunspell-ro which is generated from the openoffice.org-dictionary source package.
The hunspell-ro dictionary has comma-bellow diacritics.

Changed in myspell-ro (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.