Question [SOLVED] Search engine request for Chinese characters
- joeysun
- Topic Author
- Offline
- Junior Member
Can the webtrees search engine look for both simplified and traditional Chinese characters at the same time? Then the searcher does not have to switch between the character set. Google and Baidu search engines will do this as default.
The advantage is that you will not need to enter the Chinese names in both traditional and simplified font in the name field, and then search for them separately.
The second request is this error precaution during a search: Please enter more than one character. For example, the simplified surname 应 gives that error, but the traditional 應 (same surname Ying) does not. I am unsure if this is based on unicode. The unicode for 应 is U+5E94 and for 應 is U+61C9
Doug 周
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.
The following code can do this.
github.com/NauxLiu/opencc4php
2, I am currently using PGV, webtrees is developed from PGV, the same problem also appeared in PGV. This error is caused by the program using js to determine the length of the character, to length>1
如果你有兴趣,可以发信息给我,和我一起来维护家谱网站。
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- fisharebest
- Offline
- Administrator
This is a custom PHP extension.
I think that very few web hosts will allow you to compile and install PHP extensions.
I can find lots of on-line tools that will translate between traditional and simplifed. But I cannot find any dictionaries that I could use myself.
> The second request is this error precaution during a search: Please enter more than one character.
This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.
github.com/fisharebest/webtrees/issues/2234
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- joeysun
- Topic Author
- Offline
- Junior Member
Thanks for adding an enhancement label on github.fisharebest wrote: ...This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.
github.com/fisharebest/webtrees/issues/2234
From this 2009 NYT article : Name Not on Our List? Change It, China Says- www.nytimes.com/2009/04/21/world/asia/21china.html?_r=1
I understand now the concern. [strike]xmlf shared writing an algorithm on his Chinese PGV site that will display the parents of a search to try to narrow down the number of results. When he migrates to webtrees 2.x from PGV, maybe he can share that algorithm with you.[/strike]By some estimates, 100 surnames cover 85 percent of China’s citizens. ...By contrast, 70,000 surnames cover 90 percent of Americans.
Doug 周
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
The dictionary is already available in the opencc project.
The opencc project: github.com/BYVoid/OpenCC/
The dictionary : github.com/BYVoid/OpenCC/tree/master/data/dictionary
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
Is that going to work when the name you are searching for is in the DB in simplified characters? Perhaps it would be useful to store the name both ways:xmlf wrote: I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?xmlf wrote: The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
Very simple. If the database is a traditional Chinese character and the user's browser is Simplified, the program automatically uses the simplified language file. Character conversion can be performed by judging which language file to use.WGroleau wrote:
What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?xmlf wrote: The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.
The language recognition of the browser is zh-cn, zh-tw, etc. The default is determined by the operating system or the language file set in the browser. There is no English saying that the default is English, unless the user manually changes or is using English. If the user is using English, there is no difference between the simplified display and the traditional display.
I don't know if my explanation is clear. I am translating through Google.
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).
So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.
(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- fisharebest
- Offline
- Administrator
This opens up many possibilities...
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
For what it’s worth, more documentation is atfisharebest wrote: Interestingly, PHP seems to have built-in support for transliteration
…
This opens up many possibilities...
userguide.icu-project.org/transforms/general
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
WGroleau wrote: If the computer was purchased in an English-speaking country with Windows or MacOS, the default encoding is ISOLatin1 and the language English. Neither the manufacturer nor the seller considers the purchaser’s native language. And many users don’t know how to change it.
But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).
So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.
(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)
I think, you don't fully understand what I mean.
1. If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.
2. On other pages, such as the home page, individual page. It can be judged according to the user's browser language first. If the browser is Simplified Chinese, use the conversion function to convert Traditional Chinese to Simplified Chinese (in fact, it is not necessary to use Simplified Chinese in mainland China, but also know Traditional Chinese) If the browser is in another language, there is no need to convert the language.
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
But you didn’t say the characters in the database; you said according to the browser’s language setting. Which, for the reasons I stated, may not be the user’s language. But what if the characters in the database are traditional for some persons and simplified for others, as in the example I gave, for 衛斯理 and his grandfather 伟思礼? No matter which character set you search for, you will only find one of them.xmlf wrote: If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.
But what if you search for both?
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- joeysun
- Topic Author
- Offline
- Junior Member
Doug 周
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
Absolutely not!! Each person’s name goes in the DB the way they spelled it. If I don’t know the way they spelled it, I make the best guess from documents and their culture. I never enter wrong information to make things easier on the computer.xmlf wrote: This problem is simpler. First, you should convert the two characters that exist in your database into one. It is very easy to convert Simplified to Traditional or Traditional to Simplified.
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- xmlf
- Offline
- Junior Member
The most important thing we use in webtrees is to gather people who have a relationship with our own family, get to know each other, and make our relationship more united! Stick to this matter together, constantly update and improve.
Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Please Log in or Create an account to join the conversation.
- WGroleau
- Offline
- Platinum Member
- Posts: 2152
--
Wes Groleau
UniGen.us/
Please Log in or Create an account to join the conversation.
- joeysun
- Topic Author
- Offline
- Junior Member
The data should not be changed for the software. The software should help analyze the data.
I have not even requested searching for Chinese homophones.
Doug 周
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Please Log in or Create an account to join the conversation.