Web based family history software

Question Using ChatGPT to analyze text and enter relevant information into a tree

  • hermann
  • Topic Author
  • Offline
  • Elite Member
  • Elite Member
1 year 6 months ago #1 by hermann
The artificial intelligence ChatGPT is able to generate GEDCOM code from any text. For example, it is possible to use text from a biography, an obituary or a chronicle as input; ChatGPT then identifies the people appearing in the text, their relationships to each other and associated events, such as dates of birth and death, and generates the appropriate GEDCOM code. I documented an example of this in a blog post (in German language).

Now, it would be nice to have a webtrees module that takes the unstructured text of a document and directly inserts the people and information found in it into a webtrees tree using the ChatGPT API and the internal webtrees' raw GEDCOM editing function. Already existing information might have to be merged manually with the new information.

What do you think? Would that be helpful?

Designer of the custom module "Extended Family"

webtrees 2.1.21 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

1 year 6 months ago - 1 year 6 months ago #2 by Franz Frese
Can you use ChatGPT to make your module a chart, or do you do that by yourself?

I know there is wrong data in the net. Will ChatGPT correct that?

The gedbas database is not a database at all. It is the same as the summary of all webtrees installations.
Last edit: 1 year 6 months ago by Franz Frese.

Please Log in or Create an account to join the conversation.

  • hermann
  • Topic Author
  • Offline
  • Elite Member
  • Elite Member
1 year 6 months ago #3 by hermann
@FranzFrese: All three comments of your last post are off-topic. This is not helpful.

Designer of the custom module "Extended Family"

webtrees 2.1.21 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

  • bertkoor
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
1 year 6 months ago #4 by bertkoor
My opinion, a famous quote: good computer programs do only one thing, and do that really well.

I'm sure ChatGpt can take a verbose biography and convert that into GEDCOM format. But personally I think this is out of scope for webtrees. We don't know for how long the API will be online, there is a successor published each few months.

And what is more important: getting to a destination, or the journey towards it? For me more than half the fun and intellectual challenge is in finding data and extracting noteworthy details.

stamboom.BertKoor.nl runs on webtrees v2.1.20

Please Log in or Create an account to join the conversation.

1 year 6 months ago #5 by Franz Frese

@FranzFrese: All three comments of your last post are off-topic. This is not helpful.
sorry, ich war wohl nicht so gut drauf! Sorry, I guess I wasn't in a good mood!

Please Log in or Create an account to join the conversation.

1 year 6 months ago #6 by Bernat
Me parece un tema tan extraordinario que soy incapaz de ver hasta dónde nos puede llevar. Solo el imaginar la cantidad de trabajo que nos ahorraría y el poco tiempo que precisaríamos para muchas investigaciones, me parece maravilloso. Espero que en un tiempo cercano podamos disfrutar en webtrees de esta utilidad/módulo/idea o como lo queramos llamar.

(Como anécdota, actualmente, yo realizo un escaneado a pdf de los textos o libros para luego, simplemente, buscar texto (nombres) con mayor rapidez y tengo, además, listados de censos que no sé bien como utilizarlos. Imaginemos que pueda escanearlos y pasarlos a ChatGPT para que me haga las relaciones y datos GEDCOM. Sería increíblemente maravilloso)

Saludos Hartmann!

webtrees 2.1.20
Servidor MySQL 8.0.36
Servidor web: nginx/1.18.0
Versión PHP 8.3.3.
Hosting a webtrees.net

Please Log in or Create an account to join the conversation.

1 year 6 months ago #7 by Luenissla
Hallo Hermann und alle miteinander,

ich persönlich halte ChatGPT nicht für das Programm, dass Dir entsprechende Texte in GedCom-Dateien umsetzen kann. Du wird jeden Eintrag überprüfen müssen! Und Fehler werden sich exponential verbreiten, wie es jetzt schon mit nicht nachgeprüften Übernahmen von vorgeschlagenen Treffern (Matches) passiert.

Mir wäre ein webtrees-Modul für die Verkartung von Personenstandsregistern in Form einer vereinfachten Datenerfassung á la DES viel lieber. ;-) (Das wäre aber ein anderes Thema.)

Hello Hermann and everyone,

I personally don't think that ChatGPT is the programme that can convert texts into GedCom files for you. You will need to check every entry! And errors will spread exponentially, as it is already happening with unverified takeovers of suggested matches.

I would much prefer a webtrees module for the mapping of civil status registers in the form of a simplified data entry á la DES. ;-) (But that would be another topic).

Best regards / Viele Grüße
Hans-Joachim (Lünenschloß)

Please Log in or Create an account to join the conversation.

  • hermann
  • Topic Author
  • Offline
  • Elite Member
  • Elite Member
1 year 6 months ago #8 by hermann

My opinion, a famous quote: good computer programs do only one thing, and do that really well.

I would say "A good software module does only one thing, and does that really well." But the time that one "program/application" is doing everything that a user needs in his workflow is over. Many users of webtrees are using for example the GOV web service . This is much better than adding a database with all historical place names to webtrees (which is updated every day). Why not use a text analysis service like ChatGPT as a web service inside webtrees? Implementing such an AI function as ChatGPT offers by PHP code is nearly impossible.

We don't know for how long the API will be online, there is a successor published each few months.

Maybe, but there will be several similar web services available. So if one is not available any more in the future it should be possible to replace it with another one.

And what is more important: getting to a destination, or the journey towards it? For me more than half the fun and intellectual challenge is in finding data and extracting noteworthy details.

This is an important point. I'm doing genealogical research because I like to puzzle, I like to document. So I'm with you, the journey is why I'm doing this. But I'm using the online archives instead of visiting a parish archive because it is much more efficient. I do not like to read an old document with one hundred pages in order to identify a few pages where I can find a link to my ancestors. I will prefer to ask ChatGPT to read it for me and add the information to my tree. Because it is at least 10 or 100 times more efficient. And I will still like my genealogical work.

This is what Bernat said, too:

Solo el imaginar la cantidad de trabajo que nos ahorraría y el poco tiempo que precisaríamos para muchas investigaciones, me parece maravilloso.

I personally don't think that ChatGPT is the programme that can convert texts into GedCom files for you. You will need to check every entry! And errors will spread exponentially, as it is already happening with unverified takeovers of suggested matches.

If you read my blog post, you can find that ChatGPT is a program that converts text into GEDCOM without any "hallucinations" if you are using the appropriate "prompt". Yes, you have to check everything and have to compare the distilled information against the source. But this is genealogical business as usual and nothing new compared to the takeovers of unverified matches, as you mentioned. And I'm sure if I have to transfer information from an unstructured text to my tree then my error rate will not be better than the error rate of ChatGPT.

Designer of the custom module "Extended Family"

webtrees 2.1.21 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

1 year 6 months ago #9 by JGerardi

My opinion, a famous quote: good computer programs do only one thing, and do that really well.

" ...aber ich nutze das Online-Archiv, anstatt ein Pfarrarchiv zu besuchen, weil es viel effizienter ist. Ich mag es nicht, ein altes Dokument mit hundert Seiten zu lesen, um ein paar Seiten zu identifizieren, auf denen ich einen Link zu meinen Vorfahren finden kann. Ich werde es vorziehen, ChatGPT zu bitten, es für mich zu lesen und die Informationen zu meinem Stammbaum hinzuzufügen. Weil es mindestens 10 oder 100 Mal effizienter ist."

Wer jemals die zum Teil jahrhundertealten Datenschätze der Kirchenbücher gehoben hat, der weiß welche Datenfülle sich da auftut. Nur zu erfahren wann jemand geboren wurde, heiratete oder gestorben ist, der braucht webtrees oder vergleichbare Programme nicht. Dem genügt auch eine Exceltabelle. Derjenige weiß aber auch nicht über seine Vorfahren, über deren Lebensumstände und vieles menschliche mehr.
Glauben Sie etwa ein Recherchetool würde sich um die komplexen familiären Zusammenhänge kümmern? Es gibt viele Anknüpfpunkte, die sich ausschließlich aus den Quellen erschließen.
Glauben Sie wirklich, das Programm könnte die Kurrentschriften lesen, verstehen und deuten? Die dazu notwendigen Fähigkeiten besitzt das Programm nicht. Es greift lediglich Dateien ab, die andere mit viel Sachkenntnis erarbeitet haben. Und das ist keinesfalls effizient, sondern nur langweilig und unwissenschaftlich. Die Genealogie ist eine Wissenschaft die sich mit den Quellen befasst, damit neue Erkenntnisse entstehen können. Meine Erfahrungen mit ChatGPT sind eher als ernüchternd zu bezeichnen. Ich bin nicht dafür, dass die neue Datenkrake in webtrees Einzug halten sollte. "Cave Canem" wie es im alten Rom hieß.

Please Log in or Create an account to join the conversation.

  • hermann
  • Topic Author
  • Offline
  • Elite Member
  • Elite Member
1 year 6 months ago #10 by hermann

Glauben Sie wirklich, das Programm könnte die Kurrentschriften lesen, verstehen und deuten?

Ja, sicher! Haben Sie schon einmal Transkribus verwendet? Das klappt nicht immer gut, aber immer öfter. Den transkribierten Text deuten und in einem gewissen Sinne auch "verstehen", kann dann ChatGPT, wie ich gezeigt habe.

Alte Quellen in ihrer ganzen Fülle zu erschließen, selbst solche Quellen in denen die relevanten Informationen sehr dünn gesäht sind, ist auch aus meiner Sicht das Ziel. Wenn ich hundert Seiten alter Gerichtsprotokolle habe und ChatGPT sagt mir, dass genau auf der Seite 42 etwas zu Personen steht, die mich interessieren, dann finde ich das sehr hilfreich. Klar muss ich dann auch noch die Seite davor und danach genauer inspizieren, aber erst einmal finde ich etwas. Was soll daran langweilig und unwissenschaftlich sein?

Ich glaube Sie haben nicht verstanden, was ich möchte. ChatGPT mag mit seinen Trainingsdaten eine Datenkrake sein, aber das ist völlig unabhängig davon ob wir das Tool verwenden oder nicht. Ich will ChatGPT als Werkzeug einsetzen um Texte zu analysieren und ggf. auch um Texte zu erzeugen. Wenn man das richtig macht, ist das sehr erfolgreich und begeisternd. Ich weiß nicht was Ihre Ernüchterung bewirkt hat, vielleicht hatten Sie eine andere Erwartungshaltung als ich. Ich bin begeistert.

Designer of the custom module "Extended Family"

webtrees 2.1.21 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

1 year 6 months ago - 1 year 6 months ago #11 by Bernat
¡Totalmente de acuerdo con Hermann!

Un ejemplo actual: Estoy buscando información de un individuo y alguien me comenta que en cierto libro/Censo/listado puede aparecer información sobre mi búsqueda. Actualmente, tengo la opción de leer el libro y ver si es verdad o no. Una ayuda, si es posible, es digitalizar el libro y hacer una búsqueda por texto. Lógicamente, si aparece alguna mención, voy directamente al capítulo y leo las páginas complementarias para hacerme una idea.

Imagino que esa lectura digital la puede hacer (o podrá hacer) ChatGPT, me informará de la persona que investigo y de todas las demás que encuentre, generándome entradas de GEDCOM que puedo añadir directamente a mi webtrees. (Uff.... demasiado bonito para ser verdad!!!) :-)

¡Además, creo (sinceramene) que Bertkoor está de acuerdo! :-)


webtrees 2.1.20
Servidor MySQL 8.0.36
Servidor web: nginx/1.18.0
Versión PHP 8.3.3.
Hosting a webtrees.net
Last edit: 1 year 6 months ago by Bernat.

Please Log in or Create an account to join the conversation.

1 year 5 months ago #12 by WGroleau

Now, it would be nice to have a webtrees module that takes the unstructured text of a document and directly inserts the people and information found in it into a webtrees tree using the ChatGPT API and the internal webtrees' raw GEDCOM editing function. Already existing information might have to be merged manually with the new information.

What do you think? Would that be helpful?

I do not think so. ChatGPT and its "colleagues" have been caught saying things easily proven false, arguing with and insulting human users, citing sources that do not exist, and contradicting itself when the human user questioned its output.

On the other hand, having it generate a GECDCOM fragment for me to compare with the source text would be useful, as long as I decide whether to put any of it in my DB.

To that end, I'd love to be able to add a GEDCOM fragment (with renumbering) to my DB rather than editing an exported GEDCOM and re-importing (erase and replace) the entire DB.

Wes Groleau

Please Log in or Create an account to join the conversation.

  • bertkoor
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
1 year 5 months ago #13 by bertkoor
An experiment... I used the narrative data of the module by JustCarmen of this page from Wes Groleau's site and asked ChatGPT to make GEDCOM from it.

Here is the result of the first generation:
0 @P823@ INDI 1 NAME Pierre Groleau 1 BIRT 2 DATE ABT 1777 2 PLAC Québec, Canada 1 OCCU voyageur (fur trapper) and farmer (later in life) 0 @P964@ INDI 1 NAME Marguerite Coutenais 0 @P1094@ INDI 1 NAME Michel Groleau 1 BIRT 2 DATE 1814 0 @P569@ INDI 1 NAME Charles Groleau 1 BIRT 2 DATE 1834 1 DEAT 2 DATE 1879 0 @P1327@ INDI 1 NAME Marguerite Groleau 1 BIRT 2 DATE 1809 0 @P3618@ INDI 1 NAME Hèléne Groleau 1 BIRT 2 DATE 1825 0 @P3625@ INDI 1 NAME Joseph Groleau 1 BIRT 2 DATE 1820 0 @P3628@ INDI 1 NAME Brigitte Groleau 1 BIRT 2 DATE 1832 1 DEAT 2 DATE 1832 0 @P3634@ INDI 1 NAME Lucile Groleau 1 BIRT 2 DATE 1828 1 DEAT 2 DATE 1831 0 @P823@ FAM 1 HUSB @P823@ 1 WIFE @P964@ 1 CHIL @P1094@ 1 CHIL @P569@ 1 CHIL @P1327@ 1 CHIL @P3618@ 1 CHIL @P3625@ 1 CHIL @P3628@ 1 CHIL @P3634@

I don't see the need personally to integrate this into webtrees. Nothing stops you from doing this right now.
You have total freedom of choice regarding the tools you use. Copy & Paste is your friend ;-)

stamboom.BertKoor.nl runs on webtrees v2.1.20

Please Log in or Create an account to join the conversation.

1 year 5 months ago #14 by WGroleau
I think I read that you (Hermann) found at least one "hallucination" in your results. Impressive, nevertheless. Another researcher found none in a smaller amount of input: aigenealogyinsights.com/ai-genealogy-use-cases-how-to-guides

Also interesting:

One of the items in the first link mentions using it for translation. I did a web search on that topic and found that several people testing this reported that ChatGPT is consistently worse—but not by much—than Google or DeepL at translation. (And having seen how bad Google is at languages I actually know, …)

Wes Groleau

Please Log in or Create an account to join the conversation.

1 year 5 months ago #15 by Bernat
Excelente el documento que citas: aigenealogyinsights.com/2023/06/09/using...ugins-for-genealogy/


webtrees 2.1.20
Servidor MySQL 8.0.36
Servidor web: nginx/1.18.0
Versión PHP 8.3.3.
Hosting a webtrees.net

Please Log in or Create an account to join the conversation.

1 year 4 months ago #16 by WGroleau

I'm sure ChatGpt can take a verbose biography and convert that into GEDCOM format. But personally I think this is out of scope for webtrees. We don't know for how long the API will be online, there is a successor published each few months.
If I recall correctly, the ChatGPT website says it is free temporarily to collect data to improve it.

And what is more important: getting to a destination, or the journey towards it? For me more than half the fun and intellectual challenge is in finding data and extracting noteworthy details.
When I first put my data online with PHPGEDView, what was more important to me was getting family history available to relatives. But it turns out that the relatives won't even look at my site.  Not even the ones doing genealogy.

Wes Groleau

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum