Web based family history software

Question [SOLVED] Search engine request for Chinese characters

  • joeysun
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
5 years 1 month ago #1 by joeysun
For Chinese language, there are basically two variants of character presentation, traditional and simplified. Traditional has been around for thousands of years and is currently used in Hong Kong and Taiwan. Simplified, while existing for hundreds of years, is used by the Peoples Republic of China and Singapore and has less strokes. Simplified was mandated in 1949 to increase literacy.

Can the webtrees search engine look for both simplified and traditional Chinese characters at the same time? Then the searcher does not have to switch between the character set. Google and Baidu search engines will do this as default.

The advantage is that you will not need to enter the Chinese names in both traditional and simplified font in the name field, and then search for them separately.

The second request is this error precaution during a search: Please enter more than one character. For example, the simplified surname 应 gives that error, but the traditional 應 (same surname Ying) does not. I am unsure if this is based on unicode. The unicode for 应 is U+5E94 and for 應 is U+61C9

Doug
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago - 5 years 1 month ago #2 by xmlf
1. Using traditional and simplified characters, the search results are different.

I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.
The following code can do this.
github.com/NauxLiu/opencc4php

2, I am currently using PGV, webtrees is developed from PGV, the same problem also appeared in PGV. This error is caused by the program using js to determine the length of the character, to length>1

如果你有兴趣,可以发信息给我,和我一起来维护家谱网站。

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Last edit: 5 years 1 month ago by xmlf.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #3 by fisharebest
Replied by fisharebest on topic Search engine request for Chinese characters
> The following code can do this. github.com/NauxLiu/opencc4php

This is a custom PHP extension.

I think that very few web hosts will allow you to compile and install PHP extensions.

I can find lots of on-line tools that will translate between traditional and simplifed. But I cannot find any dictionaries that I could use myself.

> The second request is this error precaution during a search: Please enter more than one character.

This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.

github.com/fisharebest/webtrees/issues/2234

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • joeysun
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
5 years 1 month ago - 5 years 1 month ago #4 by joeysun
Replied by joeysun on topic Search engine request for Chinese characters

fisharebest wrote: ...This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.
github.com/fisharebest/webtrees/issues/2234

Thanks for adding an enhancement label on github.
From this 2009 NYT article : Name Not on Our List? Change It, China Says- www.nytimes.com/2009/04/21/world/asia/21china.html?_r=1

By some estimates, 100 surnames cover 85 percent of China’s citizens. ...By contrast, 70,000 surnames cover 90 percent of Americans.

I understand now the concern. [strike]xmlf shared writing an algorithm on his Chinese PGV site that will display the parents of a search to try to narrow down the number of results. When he migrates to webtrees 2.x from PGV, maybe he can share that algorithm with you.[/strike]

Doug
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Last edit: 5 years 1 month ago by joeysun. Reason: update

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #5 by xmlf
Sorry, I may not have considered using a web host user. Because I am using vps.
The dictionary is already available in the opencc project.

The opencc project: github.com/BYVoid/OpenCC/

The dictionary : github.com/BYVoid/OpenCC/tree/master/data/dictionary

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #6 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters
F

xmlf wrote: I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.

Is that going to work when the name you are searching for is in the DB in simplified characters? Perhaps it would be useful to store the name both ways:
Code:
1 NAME /伟/思礼 1 NAME /韦/斯利

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #7 by xmlf
The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #8 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters

xmlf wrote: The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.

What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #9 by xmlf

WGroleau wrote:

xmlf wrote: The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.

What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?

Very simple. If the database is a traditional Chinese character and the user's browser is Simplified, the program automatically uses the simplified language file. Character conversion can be performed by judging which language file to use.
The language recognition of the browser is zh-cn, zh-tw, etc. The default is determined by the operating system or the language file set in the browser. There is no English saying that the default is English, unless the user manually changes or is using English. If the user is using English, there is no difference between the simplified display and the traditional display.

I don't know if my explanation is clear. I am translating through Google.

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #10 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters
If the computer was purchased in an English-speaking country with Windows or MacOS, the default encoding is ISOLatin1 and the language English. Neither the manufacturer nor the seller considers the purchaser’s native language. And many users don’t know how to change it.

But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).

So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.

(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #11 by fisharebest
Replied by fisharebest on topic Search engine request for Chinese characters
Interestingly, PHP seems to have built-in support for transliteration between traditional and simplified scripts:
Code:
php > echo transliterator_transliterate('Hans-Hant', '伟思礼'); 偉思禮 php > echo transliterator_transliterate('Hant-Hans', '偉思禮'); 伟思礼

This opens up many possibilities...

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #12 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters
,

fisharebest wrote: Interestingly, PHP seems to have built-in support for transliteration

This opens up many possibilities...

For what it’s worth, more documentation is at
userguide.icu-project.org/transforms/general

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago - 5 years 1 month ago #13 by xmlf

WGroleau wrote: If the computer was purchased in an English-speaking country with Windows or MacOS, the default encoding is ISOLatin1 and the language English. Neither the manufacturer nor the seller considers the purchaser’s native language. And many users don’t know how to change it.

But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).

So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.

(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)


I think, you don't fully understand what I mean.
1. If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.
2. On other pages, such as the home page, individual page. It can be judged according to the user's browser language first. If the browser is Simplified Chinese, use the conversion function to convert Traditional Chinese to Simplified Chinese (in fact, it is not necessary to use Simplified Chinese in mainland China, but also know Traditional Chinese) If the browser is in another language, there is no need to convert the language.

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Last edit: 5 years 1 month ago by xmlf.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #14 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters

xmlf wrote: If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.

But you didn’t say the characters in the database; you said according to the browser’s language setting. Which, for the reasons I stated, may not be the user’s language. But what if the characters in the database are traditional for some persons and simplified for others, as in the example I gave, for 衛斯理 and his grandfather 伟思礼? No matter which character set you search for, you will only find one of them.

But what if you search for both?

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

  • joeysun
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
5 years 1 month ago #15 by joeysun
Replied by joeysun on topic Search engine request for Chinese characters
The reason I asked for this search feature is that I have users in different languages searching my database for profile names that contain either simplified or traditional characters.I want to encourage my users to input new profiles using the Chinese character they recognize as belonging to their ancestors, either in simplified or traditional sets. A possible work around is that I can deploy a script periodically to add simplified or traditional characters to the GEDCOM.

Doug
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #16 by xmlf
This problem is simpler. First, you should convert the two characters that exist in your database into one. It is very easy to convert Simplified to Traditional or Traditional to Simplified.

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #17 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters

xmlf wrote: This problem is simpler. First, you should convert the two characters that exist in your database into one. It is very easy to convert Simplified to Traditional or Traditional to Simplified.

Absolutely not!! Each person’s name goes in the DB the way they spelled it. If I don’t know the way they spelled it, I make the best guess from documents and their culture. I never enter wrong information to make things easier on the computer.

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago - 5 years 1 month ago #18 by xmlf
Why is there a way of spelling mistakes? The difference between simplified and traditional characters is simply that the way the simplified characters are written is simplified. My family tree has 13 books data, more than 60,000 people. Since the inheritance of the Qing Dynasty in the Qing Dynasty, it has been written in traditional Chinese characters. However, in order to make it easy for people to find, read, and update, I am using Simplified Chinese characters. Is there any problem with this? There is no problem! It is equivalent to the difference between w and W.

The most important thing we use in webtrees is to gather people who have a relationship with our own family, get to know each other, and make our relationship more united! Stick to this matter together, constantly update and improve.

Wang Family Website of Suining County, China
www.snwsjz.com
A family tree website that is customized, more humanized and convenient for users.
Last edit: 5 years 1 month ago by xmlf.

Please Log in or Create an account to join the conversation.

More
5 years 1 month ago #19 by WGroleau
Replied by WGroleau on topic Search engine request for Chinese characters
You may do that if you wish. I doubt any serious genealogist would approve of changing names to support the software.

--
Wes Groleau
UniGen.us/

Please Log in or Create an account to join the conversation.

  • joeysun
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
5 years 1 month ago - 5 years 1 month ago #20 by joeysun
Replied by joeysun on topic Search engine request for Chinese characters
I agree with Wes. I also have genealogy books which go back ~1.5k years with names in traditional format. I input the names from these sources as is. I find that most mainlanders have problems reading traditional characters. According to Quora, in 2009 22% of the Chinese characters are simplified. Generally people in areas where traditional characters dominate are comfortable with simplified fonts because of the amount of media generated from the mainland. The patrilinear Chinese male names are many and webtrees is capable of displaying and searching any number of names. Whenever possible, I append gedcom standard tags indicating the type of the name. I still feel it would be cleaner if the search function automatically looks for both traditional and simplified at the same time.

The data should not be changed for the software. The software should help analyze the data.

I have not even requested searching for Chinese homophones.

Doug
webtrees v2.1.19 at Our Family Tree (Jiapu 家譜/家谱)
PHP Version v8.1.x, LiteSpeed V8.1, MariaDB 15.1 | protected/'hindered' by ModSecurity
Last edit: 5 years 1 month ago by joeysun. Reason: clarity and added information

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum
}