Question New import code (fixes max_allowed_packet issues)
- fisharebest
- Topic Author
- Offline
- Administrator
On many servers, this causes problems when the file size is greater than the mysql setting "max allowed packet".
I have just rewritten the import code - it now loads files in blocks of 64KB, which should solve this problem, reduce memory requirements, etc.
Since this is a core function, it would be good if this could be thoroughly tested, on a variety of different servers.
If you can download snapshot build 9646 or later (e.g. launchpad.net/webtrees/trunk/9646/+download/webtrees-9646.zip ), and test this for me, it would be a great help.
Thanks,
Greg
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- ToyGuy
- Offline
- Moderator
- Live like it's Christmas every day - Santa Stephen
OUCH - I don't think I want to be doing many imports!
Config:
Mac OS-X 10.6.4
iCore 5 2.53ghz processor
8 gb ram
Apache 2.2.14, MySQL 5.1.47, PHP 5.3.2
48.1 mb GEDCOM with 85,140 INDI's, 28333 FAM's
- - - - - - Import Process - - - - - -
9:08a begin
9:19:45a (11:45 minutes) to start loading bar
10:29a (1 hr, 21 minutes) to reach 68%
10:44:52a (1 hr, 36 minutes, 52 seconds)
Again, OUCH
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28
Please Log in or Create an account to join the conversation.
- fisharebest
- Topic Author
- Offline
- Administrator
Even deleting the old data (the pause before the progress bar starts moving) takes <10 seconds. If you're taking nearly 12 minutes to do this (just 14 delete statements) I would be tempted to look at your MySQL configuration.
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- lexoulu
- Offline
- New Member
Loaded successfully in about 20-25 min wall-time.
No error shown.
MySQL:
max.packet.size = 1M
net buffer length = 8
Gedcom:
37'000 individuals.
Do you need other details?
- Lex
openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5
Please Log in or Create an account to join the conversation.
- fisharebest
- Topic Author
- Offline
- Administrator
No - that's fine. Thanks for testing it.
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- fisharebest
- Topic Author
- Offline
- Administrator
I'll take a look at your server configuration, to see if there is anything obvious that might be causing this....
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- ToyGuy
- Offline
- Moderator
- Live like it's Christmas every day - Santa Stephen
That import was my laptop, not the whizzy server. Not about to take it down for an import I don't need to do. The Octo-Core server is much faster, but none have met the 10-12 minute imports that we had back in April, before you 'started messing' with the code to make it more flexible. Just letting you know.
Stephen
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28
Please Log in or Create an account to join the conversation.
- dxradio
- Visitor
PGV 4.2.3 imports the same GEDCOM in about 13 minutes
The process generated 171 rows in the (new) wt_gedcom_chunk table to achieve the import.
I'll try the home machine in a moment.
Please Log in or Create an account to join the conversation.
- dxradio
- Visitor
24374 INDI
12919 FAM
Total Events 54362
Similar to Greg, the deletion time was a small handful of seconds.
Please Log in or Create an account to join the conversation.
- ToyGuy
- Offline
- Moderator
- Live like it's Christmas every day - Santa Stephen
Further - On the laptop, there are 771 chunk rows ranging from 34kb to 77.9Kib (more than several exceeded the 64Kib chunk limit.).
Interesting Mark's results given the earlier imports beat PGV by better than a 2x factor. IF PGV imported his GEDCOM in 13 minutes, the older webtrees import should have been less than 7 minutes and as few as 5 minutes.
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28
Please Log in or Create an account to join the conversation.
- fisharebest
- Topic Author
- Offline
- Administrator
The actual importer just keeps processing 64KB blocks until it has consumed > 1 second of elapsed time, then refreshes the page and updates the progress bar. This keeps the memory and CPU requirements very small. It also allows you to interrupt the process (e.g. navigate away from the page and return to it).
It is all a bit of a trade-off. A 64KB block can contain just one long note (very quick to process), or lots of links and places (very slow to process). I came up with this figure after trying out a number of different servers. An information-dense block on a slow server takes a few seconds. Information-sparse blocks on a fast server take a fraction of a second. If you want to experiment with larger/smaller blocks sizes, just change the 65536 value in editgedcoms.php at line ~54.
<<That import was my laptop, not the whizzy server>>
My mistake - I assumed you had used a second/test database on the same server.
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- dxradio
- Visitor
PGV 4.2.3 import ... Execution time: 126 sec.
webtrees SVN
2010-10-10 14:38:48 config Gedcom setting "imported" set to ""
2010-10-10 14:40:40 config Gedcom setting "imported" set to "1"
which I calculate to be 112 seconds
So these are essentially the same. Machine is not otherwise heavily loaded.
The Internet server machine is shared so will be more variable and susceptible to delays.
I don't have any other large gedcoms to experiment with.
Please Log in or Create an account to join the conversation.
- lexoulu
- Offline
- New Member
Loaded successfully in about 2 min elapsed time.
Performed on test server (as also in previous message) with same configuration as in signature prod server.
Processor (CPU): AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
MySQL:
max.packet.size = 1M
otherwise default MySQL conf. values
Gedcom:
37'000 individuals, 10500 fam, 50500 events
Apache default conf. values
openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5
Please Log in or Create an account to join the conversation.
- ToyGuy
- Offline
- Moderator
- Live like it's Christmas every day - Santa Stephen
While no expert, I don't believe that the PGV-to-webtreeswizard would be impacted by the import code, but I could be wrong.
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28
Please Log in or Create an account to join the conversation.
- lexoulu
- Offline
- New Member
I'm not sure what you mean, but though you might be referring to the difference in method of importing 1) using wizard, or 2) adding a new gedcom.
I was interested in the time difference, so I performed the 'Add a new GEDCOM' again, just to check.
With the same data (37'000 individulas, SVN 9658 etc), the GEDCOM file was successfully loaded in 17 - 18 minutes elapsed time.
Comparing this to the wizard method, there is a significant difference. But, I'm not really surprised, as it is a difference between database and file handlling methods.
-Lex
openSUSE 12.1, Apache 2.2, PHP 5.3, MySQL 5.5
Please Log in or Create an account to join the conversation.