Web based family history software

Question Concepts for organizing a genealogy in webtrees as an archive

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 9 months ago - 2 years 9 months ago #1 by Jefferson49
Hello webtrees users,

currently, I am half way through reorganizing my genealogical data, documents, and photos with webtrees. After some re-structuring of my digital files and folders as well as of my physical documents, photos, and folders, I recognized that I have to think my genealogy as an “archive” and webtrees as an important tool to organize the archive. I found it very helpful to look at my archive in the same way like to visit a public archive. It needs to be organized in a way that someone else (and myself!) can efficiently find useful information.

In the following paragraphs, I want to share some of the concepts, which I have applied. Afterwards, I will share some draft ideas and concepts for further enhancements.

I would be happy if you could provide some feedback or share some of your experiences.

Some steps to take to organize an archive within webtrees:
  1. Design a directory structure for your archive (in some arbitrary text editor, text program, or spreadsheet)
  2. Create a dedicated repository <My Archive> as REPO object to represent your archive in webtrees
  3. Put all your sources of your genealogy into webtrees as SOUR records and assign them to <My Archive>.
  4. For each of the sources, fill the Call Number related to <My archive>
  5. Include the directory (or better: abbreviations of the directory) from 1. to the Call Number of the sources in <My archive>, e.g. “/Biogr/Mil/MilHe/Nr. 5 ” for the 5th source in the directory “Biographies/Miller/Miller, Henry/”. Note: This is a pretty usual way to organize call numbers, which is also used by professional archives.
  6. Put all the media files related to a source into a webtrees media folder, which directly corresponds with your archive directory structure from 1. and your Call Number structure from 5., e.g. ”../webtrees/data/media/Biographies/Miller/Miller, Henry/Nr. 5 - University Degree Henry Miller from 1920.pdf”
  7. Use the same directory structure also for your physical folders, documents, photos, e.g. put a physical folder in your bookshelf with the label “Biographies” and arrange the documents within the physical folder exactly in the same way like the archive structure from 1. (and also 5. and 6.)
  8. For maintaining and re-arranging the directory structure in webtrees, use “Control panel, Manage family trees, Data fixes, Search an Replace”, e.g. search for: “/Biogr/Mil/MilHe/” and replace by: “/Biogr/UK/MilHe/”
  9. Enhance your source data with information about content, places and date period, e.g. use the available webtrees (and GEDCOM) fields “Data”, “Data/Date”, “Data/Place”, “Data/Note”, “Data/Event” within the webtrees source objects. Note: “Data/Note” can be used for a general description of the source.

Ideas for further steps and concepts:
  • Generate a finding aid (unfortunately, with several steps of manual effort, but maybe, once a year): e.g. export webtrees data via GEDCOM, convert to csv or xlsx, do some spreadshead or database magic, generate a report (e.g. a MS Access report). Note: A “finding aid” for other users (and yourself!) is a standard overview document used in archives in order to quickly find sources and provide insights about the archive structure. Note: if all the data mentioned above is available in webtrees, it is a perfect base for a finding aid.
  • Automatically generate a finding aid as a webtrees report: e.g. use the available webtrees data from above and put it into a webtrees report. This would be a great webtrees feature, but needs someone to develop a plugin module for webtrees.
  • Use a freely available professional archive software, e.g. AtoM ( www.accesstomemory.org/ ) with a similar LAMPP architecture like webtrees: export webtrees source data via GEDCOM, convert to csv or xlsx, do some spreadshead or database magic, generate csv file according to AtoM csv import template, import to AtoM, use full feature set of AtoM archive software, use (generated) hyperlinks from AtoM records to webtrees sources, use AtoM round trip feature for consecutive csv re-imports based on UIDs.
  • Automatically create AtoM csv export/import files with a webtrees plug-in in order to automate AtoM import and round trip data exchange.
  • Generate professional finding aid: import data to AtoM, generate a finding aid from AtoM.
  • Generate standardized archive EAD xml file: import data to AtoM, generate standardized EAD xml file from AtoM.
  • Make your archive public by providing an EAD xml file to an archive portal, e.g. www.archivesportaleurope.net
Last edit: 2 years 9 months ago by Jefferson49.

Please Log in or Create an account to join the conversation.

More
2 years 9 months ago #2 by hermann
I like your well-structured approach!

I have two different types of archives: physical and virtual. The first one are two boxes and a bookshelf at home, the second one are a folder on my NAS (file server) and my e-mail archive. At the moment my archive box is structured in the way that the newest or last used documents are on the top. So point 1 from your list is on my to-do list. For my virtual archives, I'm a fan of flat storage, i.e. no directory structure. There are only two folders "done" and "to-do". I prefer tagging all documents because there are so many different views regarding the photos and documents and a directory structure supports only one view and this is in my opinion not flexible enough. To find something it is easier for me to search instead of following a predefined directory structure.

Your point 1: I have to do this for my archive boxes, but I will not do it for my virtual archives.
Your points 2, 3, 4, and 9: done

Regarding your further steps: I just had a look at AtoM, an interesting tool! Thank you for this hint.

I never heard about www.archivesportaleurope.net , but when I just opened it, I found some very interesting links to documents in an archive, I didn't know about it. Cool!

Hermann
Designer of the custom module "Extended Family"

webtrees 2.1.20 (all custom modules installed, PHP 8.2, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 9 months ago #3 by Jefferson49

For my virtual archives, I'm a fan of flat storage, i.e. no directory structure. There are only two folders "done" and "to-do". I prefer tagging all documents because there are so many different views regarding the photos and documents and a directory structure supports only one view and this is in my opinion not flexible enough. To find something it is easier for me to search instead of following a predefined directory structure.

The point about multiple views on the virtual part of the archive is really a good argument. I have to think about this. Maybe, it is also possible to use a combination, i.e. one view by directory, further views by additional attributes.

From your experience: do have proposals for some standard approach for document attributes, e.g. use PDF, JPEG, ... attributes and an open available search tool?

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 9 months ago #4 by Jefferson49

I never heard about www.archivesportaleurope.net , but when I just opened it, I found some very interesting links to documents in an archive, I didn't know about it. Cool!


The archive portals really offer great possibilities. Since more and more archives have digital catalogs, they can share it and make the catalog searchable for a greater audience.

There is also an interesting archive portal in Germany: www.archivportal-d.de

Please Log in or Create an account to join the conversation.

More
2 years 9 months ago - 2 years 9 months ago #5 by miqrogroove
I've taken a different approach because of a need for portability and standardization.

Other than photos, all "archives" have been placed via Apache Alias directive under "/lib" and meaningful subdirectories. In essence, the document library lies outside of webtrees.

Where each item is referenced in webtrees, I use a markdown link or links directly to the items.

To help with semantics and cross referencing, I place the markdown within a shared note when needed.

Pros:
  • Native index browsing.
  • Intuitive URL structure.
  • Intuitive file-based management.
  • URLs can be permanent, and would not change if webtrees changes.
  • Markdown links survive the export process.
  • Shared notes provide rich metadata and cross referencing.

Cons:
  • MD Links are a one-way mechanism, so there's no direct navigation from the index browser back into webtrees.
  • The webtrees media firewall is bypassed, which might or might not be desirable.
  • The webtrees media and thumbnail system are stored and managed separately, which might or might not be desireble.
Last edit: 2 years 9 months ago by miqrogroove.

Please Log in or Create an account to join the conversation.

More
2 years 6 months ago #6 by miqrogroove
Has anyone tried setting up CollectiveAccess as an alternative to the AtoM system?

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 6 months ago #7 by Jefferson49
Thanks for the hint! When running an internet search about open source archive systems, I did not find Collective Access. After reading some part of the manual, it sounds promising. It seems to provide structures for an archive but might provide better media handling than AtoM. It seems to come more from the gallery side with media management. AtoM's approach is more centered on an archive, which also has (limited) media handling.

According to the manual, CollectiveAccess provides sophisticated import/export features and web based APIs, which might offer possibilities for data exchange with webtrees and GEDCOM.

After spending significant time for a test installation of AtoM, the list of installation requirements for Collective Access does not encourage me for an instant try. Unfortunatelly, these archive and media management systems are much more complex than webtrees. I will put it on a list for a weekend, when I am in the mood for new experiments.

I would be also very interested to hear if someone has some experiences with it.

Please Log in or Create an account to join the conversation.

More
2 years 6 months ago #8 by miqrogroove
Many of the media support libraries are optional and could be added as needed. The admin UI shows which ones are working or missing. I think the main lift with CollectiveAccess is understanding the install profiles well enough to end up with a good default setup.

I haven't looked under the hood of the front end system yet.

I'm not going to need import features because I haven't documented my archives in any way yet. I do wonder though, how easy or difficult it would be to setup up common IDs or call numbers for webtrees and an archive system to deep link each other without duplication of labor.

Please Log in or Create an account to join the conversation.

More
2 years 4 months ago - 2 years 4 months ago #9 by miqrogroove
Trying to figure out the remaining requirements for Collective Access. The "front end" doesn't have a shred of documentation except for upgrading from older versions.

Just by looking at the default setup.php settings, I get an impression that the "front end" and "back end" applications are entirely separate scripts and intended to run on separate websites (sub-domains, maybe??). They would share a database and file storage, but the site-level settings could be different for each.

The "back end" documentation says to install it "to the root of the web server instance". So I don't think it would be good to try to put it on the same site as webtrees. And now that means we're talking about running 3 totally separate websites.

Edit: Curve ball. There's a relative root path variable mentioned in app/helpers/post-setup.php of both packages that says, "attempts to determine the relative URL path automatically." It wasn't mentioned in the documentation. This should offer plenty of flexibility.
Last edit: 2 years 4 months ago by miqrogroove.

Please Log in or Create an account to join the conversation.

More
2 years 4 months ago - 2 years 1 month ago #10 by miqrogroove
Solved a major roadblock with Collective Access.

The back end docs are here:

manual.collectiveaccess.org/setup/Installation.html

But the front end docs currently live in a separate place that I couldn't find without help from multiple other resources.

web.archive.org/web/20210214194551/https...nstalling_Pawtucket2

It makes so much sense now.
Last edit: 2 years 1 month ago by miqrogroove. Reason: Original link was broken

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 1 month ago #11 by Jefferson49
Just recently, I have published a new webtrees custom module "Repository Hierarchy" to support some of the use cases, which I described in my initial post.

Please Log in or Create an account to join the conversation.

More
2 years 1 month ago - 2 years 1 month ago #12 by miqrogroove
Wow, that module must have been a huge accomplishment.

The direction I've taken is slightly different from before.

After setting up CollectiveAccess, I noticed it does a very good job of maintaining hierarchical "collections" (repositories and sources in gedcom jargon) as a separate table where they can be managed independently of other data.

As I add "objects" to the archive, they are assigned an autonumbered URL that is also independent of the object data. So I am continuing the use of Shared Notes in webtrees, and simply adding a markdown link to the archive object's URL, then using the shared note on individual source citations.

Within CollectiveAccess, I figured out how to add a list of links to the object editor, so I can also manually link an object back to the webtrees individuals. This part might be possible to automate as it is a minor duplication of info. However, I use the same feature to store links to external sites where I've found microfilm images and indexes and such.
Last edit: 2 years 1 month ago by miqrogroove.

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 1 month ago #13 by Jefferson49

As I add "objects" to the archive, they are assigned an autonumbered URL that is also independent of the object data. So I am continuing the use of Shared Notes in webtrees, and simply adding a markdown link to the archive object's URL, then using the shared note on individual source citations.

Within CollectiveAccess, I figured out how to add a list of links to the object editor, so I can also manually link an object back to the webtrees individuals. This part might be possible to automate as it is a minor duplication of info. However, I use the same feature to store links to external sites where I've found microfilm images and indexes and such.

First of all, I think that manual linking (with URLs between webtrees and back to the archive management tool) is a good and pragmatic way. For private genealogical archives and smaller number of items/sources, the additional effort to insert the links is probably acceptable. And the resulting overall solution with linking forward and backward, is already very promising.

Regarding (partly automated) export/import, I tested the following approaches with AtoM, which might also apply for CollectiveAccess. CA even seems to have a more sophisticated import interface . For AtoM, I tested the following process:
  • Gedcom export of repository and sources from webtrees
  • Gedcom to EXCEL converter, e.g. GedTool
  • Some EXCEL transformation to the required import format
  • Use import features of archive management system

However, while showing the basic feasibility of export/import, it apparently is too complicated.

While CollectiveAccess and AtoM seem to focus on specific EXCEL/CSV formats, it would be better to use a standard format to cooperate with different archive management systems. What is available is EAD XML . In the documentation of both, CollectiveAccess and AtoM, I found that EAD XML can be imported.

Please Log in or Create an account to join the conversation.

More
2 years 1 month ago - 2 years 1 month ago #14 by miqrogroove
In my estimation, managing linkage between shared notes and archival objects by using the archive object URL relieves webtrees of dealing with call numbers entirely. Whether or not this is a good thing depends if there is a need to maintain call numbers in webtrees because, for example, it might be necessary to replace the archival system some day.

Edit: I just saw there's a NOTE:REFN tag that might be appropriate to contain call numbers. In Gedcom 7 it changes to SNOTE:REFN. I might experiment with this.

Edit 2: In the CollectiveAccess front end, one alternative URL structure is /MultiSearch/Index?search=<identifier> which will redirect to the object URL. But it works in some situations and not others. I would need a more specific call number redirector. In webtrees, the NOTE:REFN field is not even displayed on individual facts/events. It would require some custom code to make it show up and point to the archive search address.
Last edit: 2 years 1 month ago by miqrogroove.

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 1 month ago - 2 years 1 month ago #15 by Jefferson49

Edit: I just saw there's a NOTE:REFN tag that might be appropriate to contain call numbers. In Gedcom 7 it changes to SNOTE:REFN. I might experiment with this.

Edit 2: In the CollectiveAccess front end, one alternative URL structure is /MultiSearch/Index?search=<identifier> which will redirect to the object URL. But it works in some situations and not others. I would need a more specific call number redirector. In webtrees, the NOTE:REFN field is not even displayed on individual facts/events. It would require some custom code to make it show up and point to the archive search address.

For call numbers in Gedcom, there is a CALN tag within the SOURCE_REPOSITORY_CITATION.

A source record may look like this:
Code:
0 @S1000@ SOUR 1 TITL Title 1 REPO @R10@ 2 CALN Fonds A / Record Group 1 / Series 12 / Nr. 7 1 NOTE @N2000@

A (partly) automated syncronisation process between webtrees and an archive management tool (AMT) could be designed as follows:
  • Export EAD XML from webtrees, which contains the data from a Gedcom repository with the described source records; and also containing the XREFs of the sources and the webtrees rest URLs to the sources
  • Import EAD XML to the AMT; in the easiest case, only new sources could be imported and existing sources could be neglected. The source XREFs could be used as unique identifyer
  • The AMT needs to transfer the webtrees URLs to a suitable data structure in the AMT source records and show the webtrees link in the AMT front end
  • ... do some archive work in the AMT ...
  • Export EAD XML from the AMT, which contains a rest URL to the sources in the AMT
  • Import EAD XML to webtrees
  • Run a data fix service in webtrees to update call numbers (CALN), add the AMT URLs to Gedcom notes, and link the Gedcom notes to the Gedcom sources. The data fix service would require some intelligence for syncing/updating

Alternatively to using Gedcom notes, AMT URLs might also be appended to the call numbers, e.g.:
2 CALN my_amt_call_number, http://my_amt_url

webtrees recognizes URLs within call numbers and the link can be directly clicked in the webtrees front end.
Last edit: 2 years 1 month ago by Jefferson49.

Please Log in or Create an account to join the conversation.

More
2 years 1 month ago - 2 years 1 month ago #16 by miqrogroove
You're generating a "source record" in webtrees for every "object" in the archive?

My gedcom has always used source records for the series level collections. That's why I'm using source citations and shared notes to represent most of the objects.

Also, same concern applies to CALN where webtrees does not pull that value into the facts/events tab. This only happens with the PAGE value in citations, which in turn doesn't apply to notes. That leads me back to the need for either markdown or custom code.
Last edit: 2 years 1 month ago by miqrogroove.

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 1 month ago #17 by Jefferson49

You're generating a "source record" in webtrees for every "object" in the archive? My gedcom has always used source records for the series level collections.
Basically, I create one source for one object. However, there is always room for specific decisions, e.g.: to define a physical folder with certificates as one source or to define each of the certificates in the folder as separate sources. Overall, I have 470 sources in my webtrees database, of which 180 belong to my private archive.

That's why I'm using source citations and shared notes to represent most of the objects.
In my case, it seems to be very simular. I also use source citations for linking sources with Gedcom facts, individuals etc.. Indeed, I consider this as the major advantage of using sources in webtrees. Overall, I count about 1700 source citations in my webtrees database.

At the moment, I do not use shared notes.

Also, same concern applies to CALN where webtrees does not pull that value into the facts/events tab.
I guess your point is about accessibility of the links in the webtrees front end and that links to an external archive management tool should be directly visible/clickable. What we would like to see is a tag in the facts/events views with a direct http:// link.

I understand your point that CALN is not shown in facts/events. It is only indirectly available by opening the link to the source first and then clicking on the call number. If the call number contains the URL, it is 2 clicks away.
Code:
2 CALN my_amt_call_number, http://my_amt_url

Have you found a way to get a one click solution with shared notes? I did some testing in the facts/events tabs. If inserting a shared note close to a source, it is possible to click the link to the shared note. Afterwards, I need a second click to open an external URL.

Please Log in or Create an account to join the conversation.

More
2 years 1 month ago - 2 years 1 month ago #18 by miqrogroove

Have you found a way to get a one click solution with shared notes? I did some testing in the facts/events tabs. If inserting a shared note close to a source, it is possible to click the link to the shared note. Afterwards, I need a second click to open an external URL.

Yes, there are several ways to use shared notes. For example, if the note is attached to a fact, or attached to a citation on a fact, it will appear in the facts/events tab. The title of the shared note is clickable, but there is also an expandable arrow next to the note that will display the text inline.

Taking this a step further, in the Control Panel > Tree (manage) > Preferences, there is a subheading "Individual pages". Under that, there are two preferences named "Automatically expand notes" and "Automatically expand sources". With these set to "yes" there will be no clicks needed to see the shared note. For cosmetic reasons, I set the first one to "yes" and the second to "no".
Last edit: 2 years 1 month ago by miqrogroove.

Please Log in or Create an account to join the conversation.

More
2 years 1 month ago - 2 years 1 month ago #19 by miqrogroove
Also, the first line of a webtrees shared note is supposed to be a title followed by two linefeeds. If you're putting a URL in the first line it might look strange.
Last edit: 2 years 1 month ago by miqrogroove.

Please Log in or Create an account to join the conversation.

  • Jefferson49
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
2 years 1 month ago #20 by Jefferson49

Also, the first line of a webtrees shared note is supposed to be a title followed by two linefeeds. If you're putting a URL in the first line it might look strange.

Thank you for the additional information. After changing the settings in the control panel, I was able to show and click the shared notes. For providing a direct accessible link in the facts/events area, it seams to be the best possibility. From my first tests, I would prefer to put the link directly in the shared note title, because it is the possibility with the least text.

However, while thinking about the resulting data structures, adding shared notes to the source citations raises some doubts from the Gedcom data model point of view: The URL to a source should be directly assigned to the source itself and not to the source citations, which might multiply the information. It seems that assigning it to the citation is only necessary to see the link in the webtrees front end for facts/events.

Placing the information in SOUR:REPO:CALN, SOUR:REPO:NOTE, or SOUR:REFN seems to be more suitable. However, the drawback would be indirect linking from facts/events (i.e. open link to source first, then follow link to URL).

To get more insight, I also want to learn more about the effort for showing additional information in the front end.

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum