Web based family history software

Question New import code (fixes max_allowed_packet issues)

  • fisharebest
  • Topic Author
  • Offline
  • Administrator
  • Administrator
More
13 years 5 months ago #1 by fisharebest
Until now, webtrees has loaded gedcom file in a single operation.

On many servers, this causes problems when the file size is greater than the mysql setting "max allowed packet".

I have just rewritten the import code - it now loads files in blocks of 64KB, which should solve this problem, reduce memory requirements, etc.

Since this is a core function, it would be good if this could be thoroughly tested, on a variety of different servers.

If you can download snapshot build 9646 or later (e.g. launchpad.net/webtrees/trunk/9646/+download/webtrees-9646.zip ), and test this for me, it would be a great help.

Thanks,

Greg

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • ToyGuy
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
More
13 years 5 months ago #2 by ToyGuy
Greg
OUCH - I don't think I want to be doing many imports!

Config:
Mac OS-X 10.6.4
iCore 5 2.53ghz processor
8 gb ram
Apache 2.2.14, MySQL 5.1.47, PHP 5.3.2
48.1 mb GEDCOM with 85,140 INDI's, 28333 FAM's

- - - - - - Import Process - - - - - -
9:08a begin
9:19:45a (11:45 minutes) to start loading bar
10:29a (1 hr, 21 minutes) to reach 68%
10:44:52a (1 hr, 36 minutes, 52 seconds)

Again, OUCH

Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Topic Author
  • Offline
  • Administrator
  • Administrator
More
13 years 5 months ago #3 by fisharebest
I didn't notice any change in performance. Loading my gedcom (about 16MB) on to my dev system (a lower specification than your machine) took about 2.5 minutes before the change, and about 2.5 minutes afterwards.

Even deleting the old data (the pause before the progress bar starts moving) takes <10 seconds. If you're taking nearly 12 minutes to do this (just 14 delete statements) I would be tempted to look at your MySQL configuration.

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
13 years 5 months ago #4 by lexoulu
Used option 'Add new Gedcom'.
Loaded successfully in about 20-25 min wall-time.
No error shown.
MySQL:
max.packet.size = 1M
net buffer length = 8
Gedcom:
37'000 individuals.

Do you need other details?
- Lex

openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Topic Author
  • Offline
  • Administrator
  • Administrator
More
13 years 5 months ago #5 by fisharebest
<<Do you need other details?>>

No - that's fine. Thanks for testing it.

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Topic Author
  • Offline
  • Administrator
  • Administrator
More
13 years 5 months ago #6 by fisharebest
Stephen - you won't be happy to know that my 3 year old, low-power, budget laptop imported your gedcom in about 2/3 the time that your whizzy server took....

I'll take a look at your server configuration, to see if there is anything obvious that might be causing this....

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • ToyGuy
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
More
13 years 5 months ago - 13 years 5 months ago #7 by ToyGuy
Greg
That import was my laptop, not the whizzy server. Not about to take it down for an import I don't need to do. The Octo-Core server is much faster, but none have met the 10-12 minute imports that we had back in April, before you 'started messing' with the code to make it more flexible. Just letting you know.
Stephen

Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28
Last edit: 13 years 5 months ago by ToyGuy.

Please Log in or Create an account to join the conversation.

  • dxradio
  • Visitor
  • Visitor
13 years 5 months ago #8 by dxradio
My Internet host shared server MySQL 5.0.91 which previously failed to import my 11 MB GEDCOM due to max allowed packet = 2,097,152 ... now imports it successfully in 26 minutes.

PGV 4.2.3 imports the same GEDCOM in about 13 minutes

The process generated 171 rows in the (new) wt_gedcom_chunk table to achieve the import.

I'll try the home machine in a moment.

Please Log in or Create an account to join the conversation.

  • dxradio
  • Visitor
  • Visitor
13 years 5 months ago #9 by dxradio
The home machine (iMac) did the import in a little less than 2 minutes ... 169 rows in the wt_gedcom_chunk table
24374 INDI
12919 FAM
Total Events 54362

Similar to Greg, the deletion time was a small handful of seconds.

Please Log in or Create an account to join the conversation.

  • ToyGuy
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
More
13 years 5 months ago #10 by ToyGuy
Greg
Further - On the laptop, there are 771 chunk rows ranging from 34kb to 77.9Kib (more than several exceeded the 64Kib chunk limit.).
Interesting Mark's results given the earlier imports beat PGV by better than a 2x factor. IF PGV imported his GEDCOM in 13 minutes, the older webtrees import should have been less than 7 minutes and as few as 5 minutes.

Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Topic Author
  • Offline
  • Administrator
  • Administrator
More
13 years 5 months ago #11 by fisharebest
The number of chunks is just dependent on the size of the gedcom, and has no other real significance. They are only *approximately* 64KB long. We need to load the file as a number of blocks (to get round the max_allowed_packet issue), but we need to ensure we never split a multi-byte character between one block and the next. One way to do this is to split at gedcom record boundaries (newline followed by "0") - which also provides convenient blocks of data to import.

The actual importer just keeps processing 64KB blocks until it has consumed > 1 second of elapsed time, then refreshes the page and updates the progress bar. This keeps the memory and CPU requirements very small. It also allows you to interrupt the process (e.g. navigate away from the page and return to it).

It is all a bit of a trade-off. A 64KB block can contain just one long note (very quick to process), or lots of links and places (very slow to process). I came up with this figure after trying out a number of different servers. An information-dense block on a slow server takes a few seconds. Information-sparse blocks on a fast server take a fraction of a second. If you want to experiment with larger/smaller blocks sizes, just change the 65536 value in editgedcoms.php at line ~54.

<<That import was my laptop, not the whizzy server>>

My mistake - I assumed you had used a second/test database on the same server.

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • dxradio
  • Visitor
  • Visitor
13 years 5 months ago - 13 years 5 months ago #12 by dxradio
On the home iMac

PGV 4.2.3 import ... Execution time: 126 sec.

webtrees SVN
2010-10-10 14:38:48 config Gedcom setting "imported" set to ""
2010-10-10 14:40:40 config Gedcom setting "imported" set to "1"
which I calculate to be 112 seconds

So these are essentially the same. Machine is not otherwise heavily loaded.

The Internet server machine is shared so will be more variable and susceptible to delays.

I don't have any other large gedcoms to experiment with.
Last edit: 13 years 5 months ago by dxradio.

Please Log in or Create an account to join the conversation.

More
13 years 5 months ago - 13 years 5 months ago #13 by lexoulu
Using the pgv-wizard-wt alternative / SVN 9658

Loaded successfully in about 2 min elapsed time.
Performed on test server (as also in previous message) with same configuration as in signature prod server.
Processor (CPU): AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
MySQL:
max.packet.size = 1M
otherwise default MySQL conf. values
Gedcom:
37'000 individuals, 10500 fam, 50500 events
Apache default conf. values

openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5
Last edit: 13 years 5 months ago by lexoulu. Reason: SVN comment

Please Log in or Create an account to join the conversation.

  • ToyGuy
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
More
13 years 5 months ago #14 by ToyGuy
Lex
While no expert, I don't believe that the PGV-to-webtreeswizard would be impacted by the import code, but I could be wrong.

Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

More
13 years 5 months ago #15 by lexoulu
Stephen,
I'm not sure what you mean, but though you might be referring to the difference in method of importing 1) using wizard, or 2) adding a new gedcom.
I was interested in the time difference, so I performed the 'Add a new GEDCOM' again, just to check.
With the same data (37'000 individulas, SVN 9658 etc), the GEDCOM file was successfully loaded in 17 - 18 minutes elapsed time.
Comparing this to the wizard method, there is a significant difference. But, I'm not really surprised, as it is a difference between database and file handlling methods.

-Lex

openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum
}