Please do NOT expect all Feature Requests to be actioned automatically. Describing your proposal here will ensure the development team are aware of it, and they will give it careful consideration.
  • Page:
  • 1
  • 2

TOPIC: [SOLVED] Search engine request for Chinese characters

Search engine request for Chinese characters 2 months 1 week ago #1

  • joeysun
  • joeysun's Avatar
  • Offline
  • New
  • Posts: 56
For Chinese language, there are basically two variants of character presentation, traditional and simplified. Traditional has been around for thousands of years and is currently used in Hong Kong and Taiwan. Simplified, while existing for hundreds of years, is used by the Peoples Republic of China and Singapore and has less strokes. Simplified was mandated in 1949 to increase literacy.

Can the webtrees search engine look for both simplified and traditional Chinese characters at the same time? Then the searcher does not have to switch between the character set. Google and Baidu search engines will do this as default.

The advantage is that you will not need to enter the Chinese names in both traditional and simplified font in the name field, and then search for them separately.

The second request is this error precaution during a search: Please enter more than one character. For example, the simplified surname 应 gives that error, but the traditional 應 (same surname Ying) does not. I am unsure if this is based on unicode. The unicode for 应 is U+5E94 and for 應 is U+61C9
Doug
webtrees v1.7.13 at Our Family Jiapu 家譜/家谱
PHP Version 7.0.30 | Home ISP w/ Raspberry Pi 2b
The administrator has disabled public write access.

Search engine request for Chinese characters 2 months 2 days ago #2

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
1. Using traditional and simplified characters, the search results are different.

I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.
The following code can do this.
github.com/NauxLiu/opencc4php

2, I am currently using PGV, webtrees is developed from PGV, the same problem also appeared in PGV. This error is caused by the program using js to determine the length of the character, to length>1

如果你有兴趣,可以发信息给我,和我一起来维护家谱网站。
Last Edit: 2 months 2 days ago by xmlf.
The administrator has disabled public write access.

Search engine request for Chinese characters 2 months 1 day ago #3

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Posts: 10909
> The following code can do this. github.com/NauxLiu/opencc4php

This is a custom PHP extension.

I think that very few web hosts will allow you to compile and install PHP extensions.

I can find lots of on-line tools that will translate between traditional and simplifed. But I cannot find any dictionaries that I could use myself.

> The second request is this error precaution during a search: Please enter more than one character.

This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.

github.com/fisharebest/webtrees/issues/2234
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net
The administrator has disabled public write access.

Search engine request for Chinese characters 2 months 1 day ago #4

  • joeysun
  • joeysun's Avatar
  • Offline
  • New
  • Posts: 56
fisharebest wrote:
...This check exists to prevent search results that contain so many results that PHP cannot display them due to memory / CPU limits.
github.com/fisharebest/webtrees/issues/2234
Thanks for adding an enhancement label on github.
From this 2009 NYT article : Name Not on Our List? Change It, China Says- www.nytimes.com/2009/04/21/world/asia/21china.html?_r=1
By some estimates, 100 surnames cover 85 percent of China’s citizens. ...By contrast, 70,000 surnames cover 90 percent of Americans.
I understand now the concern. xmlf shared writing an algorithm on his Chinese PGV site that will display the parents of a search to try to narrow down the number of results. When he migrates to webtrees 2.x from PGV, maybe he can share that algorithm with you.
Doug
webtrees v1.7.13 at Our Family Jiapu 家譜/家谱
PHP Version 7.0.30 | Home ISP w/ Raspberry Pi 2b
Last Edit: 2 months 1 day ago by joeysun. Reason: update
The administrator has disabled public write access.

Search engine request for Chinese characters 2 months 1 day ago #5

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
Sorry, I may not have considered using a web host user. Because I am using vps.
The dictionary is already available in the opencc project.

The opencc project: github.com/BYVoid/OpenCC/

The dictionary : github.com/BYVoid/OpenCC/tree/master/data/dictionary
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 4 weeks ago #6

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
Fxmlf wrote:
I think you can implement this function by modifying the program. Regardless of whether the user input is a traditional or simplified Chinese character, it will be converted into a traditional Chinese character for searching.
Is that going to work when the name you are searching for is in the DB in simplified characters? Perhaps it would be useful to store the name both ways:
1 NAME /伟/思礼
1 NAME /韦/斯利
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 4 weeks ago #7

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 4 weeks ago #8

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
xmlf wrote:
The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.
What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.
Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Search engine request for Chinese characters 1 month 3 weeks ago #9

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
WGroleau wrote:
xmlf wrote:
The most ideal method is to determine whether to enable the Simplified Chinese Traditional Function based on the user's browser language.If the user's browser is Simplified, enable Traditional to Simplified function.
What if the database has names in traditional, and the user’s browser says simplified? What if (like many) the browser doesn’t say, or still has English as the default? What if the DB has some names simplified and others traditional?
Very simple. If the database is a traditional Chinese character and the user's browser is Simplified, the program automatically uses the simplified language file. Character conversion can be performed by judging which language file to use.
The language recognition of the browser is zh-cn, zh-tw, etc. The default is determined by the operating system or the language file set in the browser. There is no English saying that the default is English, unless the user manually changes or is using English. If the user is using English, there is no difference between the simplified display and the traditional display.

I don't know if my explanation is clear. I am translating through Google.
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #10

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
If the computer was purchased in an English-speaking country with Windows or MacOS, the default encoding is ISOLatin1 and the language English. Neither the manufacturer nor the seller considers the purchaser’s native language. And many users don’t know how to change it.

But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).

So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.

(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #11

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Posts: 10909
Interestingly, PHP seems to have built-in support for transliteration between traditional and simplified scripts:
php > echo transliterator_transliterate('Hans-Hant', '伟思礼');
偉思禮
php > echo transliterator_transliterate('Hant-Hans', '偉思禮');
伟思礼

This opens up many possibilities...
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #12

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
,fisharebest wrote:
Interestingly, PHP seems to have built-in support for transliteration

This opens up many possibilities...
For what it’s worth, more documentation is at
userguide.icu-project.org/transforms/general
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #13

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
WGroleau wrote:
If the computer was purchased in an English-speaking country with Windows or MacOS, the default encoding is ISOLatin1 and the language English. Neither the manufacturer nor the seller considers the purchaser’s native language. And many users don’t know how to change it.

But what if I buy it in Taiwan or change it to en-TW? Then I enter a search for 伟, for my grandfather 伟思礼 still living on the mainland, and it isn’t found because the code looked for 偉 instead. Or it finds my cousin 偉思禮 who was born and raised in Taiwan but not our grandfather. We want the names to remain in the database as entered (the names the people are actually known by).

So, if I enter either of those names, it should search for both of them. OR, I can enter the preferred name first, and put the other as an alternate when I add the person.

(Note, 伟思礼 is actually my name—Wesley—just using it as an example.)

I think, you don't fully understand what I mean.
1. If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.
2. On other pages, such as the home page, individual page. It can be judged according to the user's browser language first. If the browser is Simplified Chinese, use the conversion function to convert Traditional Chinese to Simplified Chinese (in fact, it is not necessary to use Simplified Chinese in mainland China, but also know Traditional Chinese) If the browser is in another language, there is no need to convert the language.
Last Edit: 1 month 3 weeks ago by xmlf.
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #14

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
xmlf wrote:
If the characters in the database are traditional Chinese. Then in the search, no matter what language the user inputs, it will be converted to Traditional Chinese for searching.
But you didn’t say the characters in the database; you said according to the browser’s language setting. Which, for the reasons I stated, may not be the user’s language. But what if the characters in the database are traditional for some persons and simplified for others, as in the example I gave, for 衛斯理 and his grandfather 伟思礼? No matter which character set you search for, you will only find one of them.

But what if you search for both?
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #15

  • joeysun
  • joeysun's Avatar
  • Offline
  • New
  • Posts: 56
The reason I asked for this search feature is that I have users in different languages searching my database for profile names that contain either simplified or traditional characters.I want to encourage my users to input new profiles using the Chinese character they recognize as belonging to their ancestors, either in simplified or traditional sets. A possible work around is that I can deploy a script periodically to add simplified or traditional characters to the GEDCOM.
Doug
webtrees v1.7.13 at Our Family Jiapu 家譜/家谱
PHP Version 7.0.30 | Home ISP w/ Raspberry Pi 2b
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #16

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
This problem is simpler. First, you should convert the two characters that exist in your database into one. It is very easy to convert Simplified to Traditional or Traditional to Simplified.
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #17

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
xmlf wrote:
This problem is simpler. First, you should convert the two characters that exist in your database into one. It is very easy to convert Simplified to Traditional or Traditional to Simplified.
Absolutely not!! Each person’s name goes in the DB the way they spelled it. If I don’t know the way they spelled it, I make the best guess from documents and their culture. I never enter wrong information to make things easier on the computer.
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #18

  • xmlf
  • xmlf's Avatar
  • Offline
  • New
  • Posts: 33
Why is there a way of spelling mistakes? The difference between simplified and traditional characters is simply that the way the simplified characters are written is simplified. My family tree has 13 books data, more than 60,000 people. Since the inheritance of the Qing Dynasty in the Qing Dynasty, it has been written in traditional Chinese characters. However, in order to make it easy for people to find, read, and update, I am using Simplified Chinese characters. Is there any problem with this? There is no problem! It is equivalent to the difference between w and W.

The most important thing we use in webtrees is to gather people who have a relationship with our own family, get to know each other, and make our relationship more united! Stick to this matter together, constantly update and improve.
Last Edit: 1 month 3 weeks ago by xmlf.
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #19

  • WGroleau
  • WGroleau's Avatar
  • Offline
  • Gold
  • Posts: 1509
You may do that if you wish. I doubt any serious genealogist would approve of changing names to support the software.
--
Wes Groleau
UniGen.us/
PHP 7.2.15; MySQL 5.6.40; Apache
The administrator has disabled public write access.

Search engine request for Chinese characters 1 month 3 weeks ago #20

  • joeysun
  • joeysun's Avatar
  • Offline
  • New
  • Posts: 56
I agree with Wes. I also have genealogy books which go back ~1.5k years with names in traditional format. I input the names from these sources as is. I find that most mainlanders have problems reading traditional characters. According to Quora, in 2009 22% of the Chinese characters are simplified. Generally people in areas where traditional characters dominate are comfortable with simplified fonts because of the amount of media generated from the mainland. The patrilinear Chinese male names are many and webtrees is capable of displaying and searching any number of names. Whenever possible, I append gedcom standard tags indicating the type of the name. I still feel it would be cleaner if the search function automatically looks for both traditional and simplified at the same time.

The data should not be changed for the software. The software should help analyze the data.

I have not even requested searching for Chinese homophones.
Doug
webtrees v1.7.13 at Our Family Jiapu 家譜/家谱
PHP Version 7.0.30 | Home ISP w/ Raspberry Pi 2b
Last Edit: 1 month 3 weeks ago by joeysun. Reason: clarity and added information
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Powered by Kunena Forum