• Page:
  • 1

TOPIC:

accented letters 8 years 9 months ago #1

  • lexoulu
  • lexoulu's Avatar Topic Author
  • Offline
  • New
  • New
  • Posts: 38
New text to be translated:

Usernames are case-insensitive and ignore accented letters, so that “chloe”, “chloë”, and “Chloe” are considered to be the same. (Located in webtrees/help_text.php:2890 )


Some diacritical marks, such as the acute ( ´ ) and grave ( ` ) are often called accents.
The Scandinavian languages, treat the characters with diacritics ä, ö and å

Just wondering the significance of default character set (UTF-8 Unicode) used in webtrees and when using accented letters.
Secondly, for the Nordic region, some may start to think the ÄäÖö letters may / may not be ignored. For them, they are just 'normal' alphabet letters. I keeping in mind that most users even admins are not that well educated in the mysterious diacritical marks.

So, when looking at the technical UTF-8 Unicode capabilities of webtrees, is the translate text true concerning the accented part?
Would "hätö", "hÄtö", "hÄtÖ" be considered the same?
openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #2

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 13364
<<Usernames are case-insensitive and ignore accented letters>>

This is, perhaps, a bit of a "user-friendly" simplification.

The full answer is that accents are ignored - as per the unicode collation algorithm (utf8_unicode), which is maintained by the Unicode consortium.

<<Would "hätö", "hÄtö", "hÄtÖ" be considered the same? >>

Yes. You can test this with a simple SQL query.

The second example shows that "a" and "ä" are considered the same.
mysql> select 'hätö' collate 'utf8_unicode_ci' = 'hÄtÖ' collate utf8_unicode_ci;+-----------------------------------------------------------------------+
| 'hätö' collate 'utf8_unicode_ci' = 'hÄtÖ' collate utf8_unicode_ci |
+-----------------------------------------------------------------------+
|                                                                     1 |
+-----------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select 'hätö' collate 'utf8_unicode_ci' = 'hatO' collate utf8_unicode_ci;
+---------------------------------------------------------------------+
| 'hätö' collate 'utf8_unicode_ci' = 'hatO' collate utf8_unicode_ci |
+---------------------------------------------------------------------+
|                                                                   1 |
+---------------------------------------------------------------------+
1 row in set (0.00 sec)
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #3

  • ToyGuy
  • ToyGuy's Avatar
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
  • Posts: 4925
Lex

Secondly, for the Nordic region, some may start to think the ÄäÖö letters may / may not be ignored.

I'm not sure of the relevance of the situation with UserNames versus the regular family tree data would be for your users. Are you saying that your users would think that they should not use the ÄäÖö letters because they 'may be ignored'? They are not ignored, just not used in collation for sorting purposes as a differentiation.
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #4

  • lexoulu
  • lexoulu's Avatar Topic Author
  • Offline
  • New
  • New
  • Posts: 38
It is more a question of translating using examples with äöå. It is more important to show something typically used in their language.
Just checking so that I do not mislead, by claiming äöå is ignored, if not really true.

I was thinking that collation was not relevant, but that to store and retrieve data (the user name) into/from a MySQL table, using UTF-8 Unicode, would not cause problems for ÄäÖöÅå. I've always regarded collation more a question of sorting correctly. Seems that is an area I need to look into a bit deeper.
openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #5

  • norwegian_sardines
  • norwegian_sardines's Avatar
  • Offline
  • Gold
  • Gold
  • Posts: 1716
Lex,

I would seem to me that Nordic Region (as well as any one else) should enter names as they would to spell them correctly. Tabøe is not really the same as Taboe (før is not the same as for) but as far as UTF-8 collation, and the name list Tamb is sorted together with Tåmb, and listed in the same cell.
Ken

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #6

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 13364
Just in case there is any confusion, this issue only affects the *uniqueness* of the usernames.

You can have a user called "Tabøe" and you can also have a user called "Taboe" - but not at the same time.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #7

  • norwegian_sardines
  • norwegian_sardines's Avatar
  • Offline
  • Gold
  • Gold
  • Posts: 1716
It is interesting that I see the following that a = å but o not = ø


Ken
Attachments:

Please Log in or Create an account to join the conversation.

Re: accented letters 8 years 9 months ago #8

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 13364
The sort order will depend on the language.

This is an English page, so it will use utf8_unicode_ci sorting. I think you will get a different result in Norwegian, and A+ring will come after O+slash

IIRC, utf8_unicode_ci sorts most "o+accent" letters the same as "o" - but not o-slash, which is a separate letter, after o.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Re: accented letters 8 years 9 months ago #9

  • norwegian_sardines
  • norwegian_sardines's Avatar
  • Offline
  • Gold
  • Gold
  • Posts: 1716
Yes, you're correct changing the language changes the list.

I was also noting that the Qamb and Qåmb are in the same cell, but I suspect that this is also because UTF-8 unicode sees a and å as the same but o and ø as two different letters under "English" but not under "Norwegian". "Interesting, very interesting"


Ken
Attachments:

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
Powered by Kunena Forum