Please do NOT post requests for help here. Use the Help forum for that.
  • Page:
  • 1
  • 2

TOPIC:

Minimize a GEDCOM 1 week 6 days ago #1

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
I'm looking at uploading a GEDCOM to GEDMatch. I've privatized it ok but would like to reduce that down to most basic family structure and BDM information, mainly stripping out notes and media. I don't see any means to do that with webtrees Export GEDcom or Clipping Cart. Any suggestions for how I might go about this by other means or perhaps a cunning RegEx for search and replace in webtrees or text editors? Is this perhaps an idea for a new feature for Webtrees.

Murray Peterson

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 6 days ago #2

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
I've managed to remove Media and Notes with some regular expressions in search and replace using Notepad++
I'm not great at the "dark art" of REGEX but I used these:
\d.OBJE.@.\d*@ for the media
\d.NOTE.*$ for notes
\d.CONT.*$ multiline notes
\d.CONC.*$ multiline notes
Probably clumsy but did the trick. Then removed empty lines.
I'd still be interested to hear of how anyone else has done this this and wonder if it's worth a New Feature request in Webtrees.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 6 days ago #3

  • hermann
  • hermann's Avatar
  • Offline
  • Premium Member
  • Premium Member
  • Posts: 610
You can use an enhanced version of the clippings cart . This custom module can delete all records of the type OBJE for example in one step.
Hermann
Designer of the custom module "Extended Family"

webtrees 2.0.23 (all custom modules installed, php 7.4.15, MySQL 5.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 5 days ago #4

  • Jefferson49
  • Jefferson49's Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 40
For this kind of Gedcom "post-processing", I use the tool Gedcom Conversion, which is a part of the Gedcom Service Programs . While it needs a license, you have 10 progam starts for free for testing purposes.

Gedcom Conversion has an option to delete all notes or media, see screenshot if you follow the above link and click on "Gedcom Conversion".

The Gedcom Service Programs offer a lot of sophisticated options to automatically manipulate Gedcom files. The configurations for the chosen manipulations can be stored and there is also the possibility to automatically run a sequence of manipulations in a row.

Based on these tools, I have developed a script and tool environment, where I can automatically download a Gedcom file from webtrees and manipulate it in a way that it is fully compliant to the wanted standard and to my own requirements.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 5 days ago #5

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Thanks Hermann, I very useful module. I can capture the whole tree and skip all media objects in one go, although I still need to remove the notes by search and replace.
The TAM and Lineage visualizations are fascinating - I will explore those further.
Thanks again.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 5 days ago #6

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Hi Jefferson49,
Gedcom Service Programs looks like a very powerful tool. I'll probably stick to manual "post processing" for the limited number of times I'll need to do this but keep a note of this.
Thanks.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 3 days ago #7

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
I've come across a final hurdle trying to produce a GEDCOM for upload to GEDmatch.
GEDmatch requires private individuals to be named "Living" but I can't find any way to do this with Webtrees.

I've managed to strip down my GEDCOM to very basic BMD data and privatized all living individuals with export privacy settings.
In privatizing the GEDCOM, webtrees removes all NAME tags from living individuals.
These display as ellipsis ... ... within webtrees and I can easily do a quick hack to display these as "Living" by changing the getFallBackName() function in Individual.php but of course this has no effect on the GEDCOM itself.

The obvious solution would be to manually edit the GEDCOM but with many hundreds of records, that would be mind numbing.
Any suggestions would be much appreciated.

Murray

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 2 days ago #8

  • Peter_S
  • Peter_S's Avatar
  • Offline
  • Senior Member
  • Senior Member
  • Posts: 332
Hello Murray,

GedTool should be able to meet your requirements. Try the free shareware version with full functionality (only time limited).
gedtool.de/
Peter

webtrees 2.1.6, vesta modules, chart modules of magicsunday, extended family of hartenthaler
PHP 8.0, MySQL 8.0.16
Webhosting: genonline.de

Please Log in or Create an account to join the conversation.

Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Minimize a GEDCOM 1 week 2 days ago #9

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Thank you all for your suggestions. Peter, I will take a look at gedtool.de/.

I had hoped I might find help with this within Webtrees, being open source and with such brilliant developers and active community - perhaps a hint at where I might look in the code for something to adapt.

I would have thought that being able to produce a stripped down privatized GEDCOM, in this case to the requirements of GEDmatch but also for other public family tree projects, would be something others would find useful as well. Perhaps I'm wrong about that.

I will plug on with this myself. I'm wondering if one of the data fixes like "Add missing death records" might be something of a model.

Murray

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 1 day ago #10

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 16208
> I would have thought that being able to produce a stripped down privatized GEDCOM, in this case to the requirements of GEDmatch but also for other public family tree projects, would be something others would find useful as well. Perhaps I'm wrong about that.

Set the privacy rules accordingly (e.g. hide all facts except BMD), and use the "privatise" option when downloading the GEDCOM file.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 1 week 1 day ago #11

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Thanks Greg, I hadn't realised I can hide more than the default records with the Privacy restrictions settings - that makes that part very easy - wonderful.

However, I'm still left with the problem of how to name all private individuals as "Living" as required by GEDmatch.
The "visitor" Privacy Setting when exporting to GEDCOM completely removes the NAME tags, hence no way to name them "Living" even in post-processing the GEDCOM file.
webtrees displays those with the Fallback name but that's no use outside.

That's why in desperation I've been looking at things like adapting a Data Fix to find missing NAME tags and adding adding one with "Living" as the name.
Or to alter a GedcomRecord module to keep an empty or "Living" NAME tag. I'm not all that "up to speed" with PHP but not too bad at reusing code so any suggestions would be helpful.

I mentioned other people may be wanting to produce GEDCOMS for GEDmatch with the idea that a new webtrees feature might be useful.

Murray

Please Log in or Create an account to join the conversation.

Last edit: by photon flip.

Minimize a GEDCOM 1 week 1 day ago #12

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 16208
You can use a datafile to ensure all dead individuals have a DEAT fact.

Some clever regexes should allow you to match one's without and delete the NAME.

Then replace DEAT with NAME living/DEAT
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Last edit: by fisharebest. Reason: f***ingredients autocorrect

Minimize a GEDCOM 5 days 3 hours ago #13

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Hi Greg,
Perhaps I misunderstand your suggestion.

You can use a datafile to ensure all dead individuals have a DEAT fact.

Some clever regexes should allow you to match one's without and delete the NAME.

Then replace DEAT with NAME living/DEAT


If I'm matching records without DEAT facts then I don't have a DEAT to replace. However, I do see what you are getting at.

As a privatized exported GEDCOM record for a living individual has no NAME tag, I thought I'd try regex finding those and replacing the unused _WT_USER with NAME Living or some variation on that.

Unfortunately I haven't been able to build a regex to do it. I've tried Conditional Statements, Negative Assertions, everything but Eye of Newt.
Search and replace in webtrees has the advantage that it works on Individual records rather than the whole GEDCOM so that's good but I find regex hard going.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 4 days 23 hours ago #14

Mostly the first tag is the Name structure following by the SEX tag.
So a regex search for "0 @.*\n1 SEX"
and give a 1 NAME before the SEX tag could match more than 90 % of records.

Do a good review before update all.
Ladislav

webtrees 2.0.24 + ⚶ Vesta modules (from cissee.de/)
testing webtres 2.1.5 + ⚶ Vesta modules
on PHP Version 7.4.28

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 4 days 6 hours ago #15

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
Thanks Ladislav,
That's an interesting way of doing that. My GEDCOM has a different sequence:
0 @I1071@ INDI
1 FAMC @F395@
1 SEX F
I've figured out a regex for that: 1 FAMC.*\n1 SEX.* that finds almost all Nameless Individuals but it needs to be non-capturing to work (?:1 FAMC.*\n1 SEX.*) and webtrees regex doesn't seem to honour the non-capture (?:).
I get:
0 @I1071@ INDI
1 FAMC @F395@
1 SEX F
1 NAME Living

What am I doing wrong?

Mostly the first tag is the Name structure following by the SEX tag.
So a regex search for "0 @.*\n1 SEX"
and give a 1 NAME before the SEX tag could match more than 90 % of records.

Do a good review before update all.

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 4 days 9 minutes ago #16

  • TheDutchJewel
  • TheDutchJewel's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 117
[code removed, see note]

Note: I tested the code on regex101.com. But it does not work in webtrees, because webtrees only looks in the INDI record of each individual, and each INDI record contains only one "0 INDI" tag.
Therefore, use a text editor such as Notepad++, which looks in the entire GEDCOM file.
The proper search/replace pattern for Notepad++ can be found here .
- webtrees 2.0.19 (not 2.0.23 because missing fix for too strict markdown implementation)
  + PHP 7.4.29 | modules: Rural theme, Vesta Classic Look & Feel, Extended Family, Fancy Research Links, Family Tree Home Page, ℍ&ℍwt MultTreeView
- testing webtrees 2.1.7

Please Log in or Create an account to join the conversation.

Last edit: by TheDutchJewel.

Minimize a GEDCOM 3 days 21 hours ago #17

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48

Search:
(0.*INDI)((\n[1-5] (?!NAME).*)*)\n0.*INDI
Replace:
$1\n1 NAME Living$2

Thanks but I'm afraid that's not doing it.
The task is to find records without a NAME tag and append one without replacing the searched pattern used to make the match.
Here is an example of a section of GEDCOM with an unNAMEd and NAMEd record.
0 @I0048@ INDI
1 FAMC @F0009@
1 SEX F
0 @I0050@ INDI
1 NAME Joe /Blogs/
2 GIVN Joe
2 SURN Blogs
1 SEX M
1 FAMC @F0010@
2 PEDI birth
1 BIRT
2 DATE 1913
1 DEAT Y
1 FAMS @X730@
1 CHAN
2 DATE 22 APR 2021
3 TIME 21:29:14

The task is to find the records like the first three lines.
0 @I0048@ INDI
1 FAMC @F0009@
1 SEX F
and append a new tag: 1 NAME Living.
I can do the finding but not without replacing the 1 FAMC @F0009@
1 SEX F.

Murray

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 3 days 21 hours ago #18

  • TheDutchJewel
  • TheDutchJewel's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 117
It does exactly what you ask: it looks for the INDIs without a NAME and adds a "1 NAME Living" to it.

Note that the syntax for webtrees and Notepad++ is slightly different. For Notepad++, you need to replace the "\n″ with a "\R" in the search pattern and with a "\r\n" in the replace pattern.
- webtrees 2.0.19 (not 2.0.23 because missing fix for too strict markdown implementation)
  + PHP 7.4.29 | modules: Rural theme, Vesta Classic Look & Feel, Extended Family, Fancy Research Links, Family Tree Home Page, ℍ&ℍwt MultTreeView
- testing webtrees 2.1.7

Please Log in or Create an account to join the conversation.

Last edit: by TheDutchJewel.

Minimize a GEDCOM 3 days 20 hours ago #19

  • photon flip
  • photon flip's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 48
I can assure you I have tested your solution.
I've been using the webtrees search and replace as it operates on Individual records as that seemed like an advantage but your regex doesn't work there - no records found.
I have just now tried it with Notepad++ and and I do get finds but it also catches the INDI tag of the next record and removes that on the replace.
If you can figure out how to fix that, you've cracked it!!!
I'll also try some variations of you regex on regex101.com
I do appreciate your help.

Murray

Please Log in or Create an account to join the conversation.

Minimize a GEDCOM 3 days 20 hours ago #20

  • TheDutchJewel
  • TheDutchJewel's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 117

I can assure you I have tested your solution.
I have just now tried it with Notepad++ and and I do get finds but it also catches the INDI tag of the next record and removes that on the replace.

You're right, I forgot to group the last part.

Use this in Notepad++:

search:
(0.*INDI)((\R[1-5] (?!NAME).*)*)(\R0.*INDI)
replace:
$1\r\n1 NAME Living$2$4
- webtrees 2.0.19 (not 2.0.23 because missing fix for too strict markdown implementation)
  + PHP 7.4.29 | modules: Rural theme, Vesta Classic Look & Feel, Extended Family, Fancy Research Links, Family Tree Home Page, ℍ&ℍwt MultTreeView
- testing webtrees 2.1.7

Please Log in or Create an account to join the conversation.

Last edit: by TheDutchJewel.
  • Page:
  • 1
  • 2
Powered by Kunena Forum