Comment 12 for bug 372164

Revision history for this message
Sigve Indregard (sigve-indregard) wrote :

You should probably copy Twitter's own twitter-text library. Their regex is a bit more complicated, and in Ruby, an I am not proficient enough in Python to push a real patch:

LATIN_ACCENTS = [(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, (0xf8..0xff).to_a].flatten.pack('U*').freeze
HASHTAG_CHARACTERS = /[a-z0-9_#{LATIN_ACCENTS}]/io

However, HASHTAG_CHARACTERS are only allowed from position 2 and on:

REGEXEN[:auto_link_hashtags] = /(^|[^0-9A-Z&\/]+)(#|#)([0-9A-Z_]*[A-Z_]+#{HASHTAG_CHARACTERS}*)/io