Help - Search - Members - Calendar
Full Version: Feature request : umlaut/accented character handling in foobar core
Hydrogenaudio Forums > Hosted Forums > foobar2000 > General - (fb2k)
foorious
Hi, I haven't seen this yet on the feature proposals list : http://www.hydrogenaudio.org/forums/index....showtopic=58351

When you use a given search method (e.g. Facets) to look for an umlaut/accented expression (e.g. 'Motörhead' or 'Céline Dion'), you have two solutions :
- Type it exactly ('Motörhead' or 'Céline Dion') => it works
- Type it without umlauts/accents ('Motorhead' or 'Celine Dion') => it doesn't work (no search results)

It would be more than welcome if foobar core would support the second solution. Thus we wouldn't have to look for characters unavailable on our standard keyboards everytime we want to perform a particular search. So when we type 'o' or 'e', foobar will look for 'o', 'e', but also for 'ö', 'ô', 'ó', 'é', 'č', 'ę', 'ë', etc. (uppercase and lowercase, for all umlaut/accented characters)

Could this be put on the feature proposals list ? I'm really missing it, and I guess other people are missing it too.

Thank you.
kopf
this would be extremely handy.
--pv--
I was just redirected here from this closed topic.
Just a little comment to add: there are loads of languages with loads of national symbols. If this feature was about to be added then I'd expect covering all the possibilities. E.G. you were speaking about letters such as á, é, í ó.. What about more exotic ones e.g. used in slavic languages ô, ň, ď, š, č, ľ, ľ. Do you really want to be able to search for diacritics letters by entering their non-diacritic counterparts? Is this doable?
I'd rather like not to have this feature if it won't able to detect all the characters.
Egor
A user-definable table for transliteration may be the solution for this whole problem.
kode54
$ascii() also works.
--pv--
QUOTE(Egor @ Nov 14 2008, 21:22) *

A user-definable table for transliteration may be the solution for this whole problem.


QUOTE(kode54 @ Nov 14 2008, 22:00) *

$ascii() also works.

If $ascii works reliably then we just need to asciify all the resulting fields and that's it.
I don't think transliteration table is a good idea because this will only introduce confusion I am sure.
Peter
Automatically applying the $ascii() effect on all fields will completely destroy results on files tagged in Russian or Japanese for an example. It needs to be more complex than that.

This feature will probably be implemented sooner or later but it needs further research so we don't end with something that does more bad than good for some users or needs to be configured first.
fireslug
QUOTE(Peter @ Nov 15 2008, 13:05) *

Automatically applying the $ascii() effect on all fields will completely destroy results on files tagged in Russian or Japanese for an example. It needs to be more complex than that.

This feature will probably be implemented sooner or later but it needs further research so we don't end with something that does more bad than good for some users or needs to be configured first.


Aha. Funny how these "obvious" fixes always end up having some unintended side effect that us puny end-users never consider. smile.gif

For what it's worth, here's my "obvious" solution to this dilemma:

1. Leave the current behavior as the default.
2. Have an option to always do the straightforward $ascii() filter (with a warning about the complication Peter mentioned).
3. Have another option that would perform *both* these searches and pool all search results together. As far as I can tell, this could be safe for both Latin and non-Latin character sets, at the cost of being a bit slower.

Thoughts? I wonder what side-effect I missed ... smile.gif
Moonbase
I’d vote against some »brute-force-ASCIIing«. Instead, the application should be able to use Unicode throughout, so special characters aren’t a problem.

The search difficulties would never be resolved by just trying to get rid of diacritics on a small subset of all Unicode characters, and more couldn’t be done. Or how would you transcribe, say Russian characters like Ч Щ and so on? Or Thai (ราชอาณาจักรไทย), Japanese (日本国), and so forth?

Not easily done. Also, making up a kind of »decoding table« for all relevant Unicode Groups and languages might turn out to be nearly impossible (and a waste of resources).

I guess the better solution might be either to use a multilingual keyboard driver (that allows input of those characters, and maybe more diacritics to be added to a letter), or—my suggestion—a search that allows use of wildcards, i.e. »C?line Dion« or »C*Dion«.

If it helps, I once made a Windows (2k/XP/Vista) keyboard layout which allows direct input of many European language characters (plus some extras for better typography). The available version is based on a 105-key German keyboard layout, though. Oh, well … if there’s enough need I might make one using a US keyboard layout as basis.
fireslug
QUOTE(Moonbase @ Nov 27 2008, 03:35) *

I’d vote against some »brute-force-ASCIIing«. Instead, the application should be able to use Unicode throughout, so special characters aren’t a problem.

I guess the better solution might be either to use a multilingual keyboard driver (that allows input of those characters, and maybe more diacritics to be added to a letter) [...]


You don't understand the problem. It's neither a question of foobar handling Unicode (doesn't it already?) or of difficulty typing accented characters. The problem is in many cases NOT KNOWING what character to type. You mentioned Céline Dion. Perhaps you speak French so you know how to spell her name. If not, are you sure it isn't Diňn? Or Díon? Or Çčlěńč? Perhaps you know, but you used FreeDB to tag your files, and someone else made an error. Best to use wildcards then for the characters that may be incorrect: **l*** D***.

QUOTE

The search difficulties would never be resolved by just trying to get rid of diacritics on a small subset of all Unicode characters, and more couldn’t be done. Or how would you transcribe, say Russian characters like Ч Щ and so on? Or Thai (ราชอาณาจักรไทย), Japanese (日本国), and so forth?


This is not about transcribing wildly different languages either, only filtering Latin derivative alphabets by removing diacritics from their root characters.

If you don't want it, fine, don't use it. But as a European foobar user with a lot of French, German, Spanish, Scandinavian and Slavic music in my database, I'd love to see even a simple brute $ascii filter as a selectable preference ASAP. So I'm eagerly awaiting what Peter et al. can come up with.

[EDIT] Just to make this post constructive, here's another idea. A quick scan of the Unicode table (see e.g. here: http://wikisource.org/wiki/Transwiki:Table...rs,_32_to_9999) would seem to suggest that just about all codes up to 382 can be safely $ascii-filtered. The non-Latin unicodes are all above. This would mean a kind of translation table, but one that is straightforward to implement.

[EDIT 2] Fixed unicode link.
Moonbase
QUOTE(fireslug @ Nov 27 2008, 04:53) *

You don't understand the problem. […] The problem is in many cases NOT KNOWING what character to type.

If that (and not typing the correct characters) is the problem, there is only one answer: Get informed! (Or maybe search by genre.) I would expect that not only a professional, but also a hobbyist would know what (s)he searches for. Maybe not each and every diacritic, but most of it.

Btw, you would have to scrap the »l« and the »D« too, because we have the Polish »ł« and the »Đ« (used in Old English, Icelandic, Faroese, and some south slavic languages). So your search would effectively become »*«. smile.gif

To keep naming consistent within the own database, there’s good databases out there, i.e. MusicBrainz. If I am unsure about the spelling of an artist or a title, I simply do some research first. It’s quite beneficiary in the long term. (And I’d never use FreeDB to tag files … I use a good tagging software, some research, and a keyboard to tag files … [scnr])

Well, let’s see what gets decided. As you say, I probably don’t have to use it. Or it might turn out to be a great improvement.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.