bibtex autokey no longer ignores uncapitalized title words

Bug #243156 reported by Andrew Burrow
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Emacs
Unknown
Unknown
emacs-snapshot (Ubuntu)
Invalid
Undecided
Unassigned
emacs22 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: emacs22

RELEASE: Ubuntu 8.04
VERSION: emacs22: 22.1-0ubuntu10.1

What I Expected to Happen
====================

This affects the BibTeX mode. Emacs 20 changed the behaviour, thus:

*** Autokey generation now uses all words from the title, not just
capitalized words. To avoid conflicts with existing customizations,
bibtex-autokey-titleword-ignore is set up such that words starting with
lowerkey characters will still be ignored. Thus, if you want to use
lowercase words from the title, you will have to overwrite the
bibtex-autokey-titleword-ignore standard setting.

So for an entry

@InProceedings{,
  author = {Ganter, Bernhard and Kuznetsov, Sergei O.},
  title = {Stepwise Construction of the {Dedekind-MacNeille}
                   Completion},
  year = 1998,
  booktitle = {ICCS '98: Proceedings of the 6th International Conference
                   on Conceptual Structures},
  pages = {295--302},
  address = {Montpellier, France},
  publisher = {Springer-Verlag},
  isbn = {3-540-64791-0}
}

pressing C-c C-c should generate and add the key

    ganter98:_stepw_const_dedek_macneil_compl

instead it generates

  ganter98:_stepw_const_of_dedek_macneil_compl

In emacs21
=========

The variable `bibtex-autokey-titleword-ignore` is set to

  '("A" "An" "On" "The" "Eine?" "Der" "Die" "Das"
    "[^A-Z].*" ".*[^a-zA-Z0-9].*")

and it works as expected.

In emacs22
=========

The variable `bibtex-autokey-titleword-ignore` is set to

  '("A" "An" "On" "The" "Eine?" "Der" "Die" "Das"
    "[^[:upper:]].*" ".*[^[:upper:]0-9].*")

and it does not work as expected.

Workaround
=========

The old value from emacs21 does not solve the problem, instead I have to enumerate all prepositions and conjunctions.

Revision history for this message
pfaffman (pfaffman) wrote :

And it somehow uses ALL of the lower-case words rather than just the first letter. I"m not a regexp genious, but I couldn't find a solution other than listing all prepositions and conjunctions

Revision history for this message
era (era) wrote :

It seems that the function bibtex-autokey-get-title sets (case-fold-search t) right at the beginning, so its attempts at finding lowercase words is doomed to fail.

Arguably [[:upper:]] and [[:lower:]] should actually work regardless of the value of the variable case-fold-search, so perhaps that should be fixed rather than this individual symptom.

Incidentally, the code to loop over each individual element of the list in turn seems rather inefficient; wouldn't it be better to concatenate all elements of the list to a single big "\\|"-separated regex, and perform a single string-match?

Changed in emacs22:
status: New → Confirmed
Revision history for this message
era (era) wrote :

In the meantime, here is a simple patch which appears to fix the problem for me, on Emacs 22.2. I would appreciate it if somebody with a recent emacs-snapshot could take the time to verify that this problem still exists there (looking at CVS sources on Savannah, it ought to) and that this patch fixes it. If so, I can take care of bringing this to the attention of the Emacs maintainers.

The change to do a single regex match is not strictly necessary, but should hopefully improve scalability for long lists of ignore words, and might improve performance if the regex engine manages to do a good job with optimizing the generated expression.

The local "case-fold-search nil" and the reintroduction of [:lower:] in the final regex are the real meat of this patch.

Repro steps in some more detail:

1. Create a file /tmp/nst.bib and paste in the example entry from the problem report
2 In that buffer, press C-c C-c
3. Observe the suggested title in the minibuffer. (Press C-g to discard it and quit.)

What should have happened and what does happen as in the original report.

Revision history for this message
era (era) wrote :

Reproed and fix confirmed on emacs-snapshot 1:20081013-1. The patch needed some minor editing before it would apply (a comment in the second hunk just before the actual diff had been reformatted with `quotes').

Revision history for this message
era (era) wrote :

> The patch needed some minor editing before it would apply
> (a comment in the second hunk just before the actual diff
> had been reformatted with `quotes').

Third hunk, actually.

          ;; Ignore words matched by one of the elements of
- ;; bibtex-autokey-titleword-ignore
+ ;; `bibtex-autokey-titleword-ignore'

Revision history for this message
era (era) wrote :

I forwarded this bug to the Emacs maintainers. I'm also attaching an updated patch for emacs-snapshot 1:20090207

Revision history for this message
era (era) wrote :

Upstream accepted and committed the patch. I'm not sure if the Launchpad tracker for Emacs upstream bugs works yet, so noting it here.

Changed in emacs-snapshot (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.