Question
Too many CPU Seconds
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
What is this? How did I get here? And how do I fix it?
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- norwegian_sardines
-
- Offline
- Platinum Member
-
- Posts: 2569
Base on my limited knowledge it looks like your host limits the amount of work (cpu cycles) you have purchased.
Are you making a lot of updates? Did you just load a new GEDCOM. Is the internet mapping your site for search robots?
Have you talked to your host about what this means and how they can make it better for you?
Remember, webtrees uses a database and requires SQL queries to find data! Is this host more geared to supporting static web pages and do they have other plans that support more data intense websites!
Ken
Please Log in or Create an account to join the conversation.
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
block_asn="AS45899=VNPTCorp"
... but checking recent traffic, I AM STILL GETING MANY HITS FROM THESE PEOPLE - THE BLOCK IS NOT WORKING!
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- kiwi
-
- Offline
- Platinum Member
-
You need to check every IP address from your recent logs to find all the ASNs to block, then list all of them in your config file.
Then repeat regularly , daily / weekly, until they’re all stopped.
Nigel
www.our-families.info
Please Log in or Create an account to join the conversation.
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
BUT, why was this not working? (MANY Vietnam)
block_asn="AS45899"
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- kiwi
-
- Offline
- Platinum Member
-
Was this done through some feature your web host provides? If so you should talk to them about why it’s not working. If they can get that working, you won’t need the webtrees ASN blocking.No, not all, but many. I found an option to block countries, so I have blocked China, Vietnam, Russia & Brazil.
If not your host, how did you do it?
Nigel
www.our-families.info
Please Log in or Create an account to join the conversation.
- photon flip
-
- Offline
- Junior Member
-
The majority of hits seem to be calendar requests - every possible date permutation. These are set by default as no-index but that is ignored. Is there a way to block requests to calendar urls?
MurrayJ
Fuller-Bennett Family Tree
Please Log in or Create an account to join the conversation.
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
Nigel: When the block_asn added to the config.ini,php failed to work, I looked for another solution. My host has a tool to block IP ranges or countries. China, Vietnam & Russia are constant problems now blocked, so we will see.
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- bertkoor
-
- Offline
- Platinum Member
-
- Greetings from Utrecht, Holland
Control Panel -> Modules-Genealogy-Menus -> Calendar -> show to members onlyIs there a way to block requests to calendar urls?
stamboom.BertKoor.nl runs on webtrees v2.2.1
Please Log in or Create an account to join the conversation.
- photon flip
-
- Offline
- Junior Member
-
MurrayJ
Fuller-Bennett Family Tree
Please Log in or Create an account to join the conversation.
- bertkoor
-
- Offline
- Platinum Member
-
- Greetings from Utrecht, Holland
stamboom.BertKoor.nl runs on webtrees v2.2.1
Please Log in or Create an account to join the conversation.
- photon flip
-
- Offline
- Junior Member
-
I don't know if the 600,000 hits on calendar are malicious or just poorly configured bots but it's crazy trying to play wackamole with blocks.
I should note I also get a lot of bots targeting the contact url.
I'm going to talk to my hosting provider Site ground, they advertise that they try to stop DDS style attacks but it's obviously not working.
MurrayJ
Fuller-Bennett Family Tree
Please Log in or Create an account to join the conversation.
- bertkoor
-
- Offline
- Platinum Member
-
- Greetings from Utrecht, Holland
Ah yes, now I see...the calendar module only controls the menu, the calendar is a core function for date display and is accessible buy URL.
Maybe easiest is to put something like this in resources/views/calendar-page.phtml
These are surely bots that ignore all "no trespassing" signs and just scrape any content they can find. Not really malicious, just plain dumb. I'd guess that they want web content to train their new LargeLanguageModel on. The bots don't realise that the content they get is rather useless for that.I don't know if the 600,000 hits on calendar are malicious or just poorly configured bots
Simply stated, if your site is reachable for everybody then there is no 'denial of service' caused by the traffic and hence it's working.they advertise that they try to stop DDS style attacks but it's obviously not working.
You don't know how much bot traffic is actually stopped by them already.
stamboom.BertKoor.nl runs on webtrees v2.2.1
Please Log in or Create an account to join the conversation.
- fisharebest
-
- Offline
- Administrator
-
I spotted this a few days ago.
For the calendar page, we set "meta robots=noindex".
This should be "meta index=noindex,nofollow".
The result is that even legitimate crawlers are trying to visit every date in history...
I'll push out a new release with this fix in the next day or two.
But I am also having problems with crawlers - it seems that everybody in the world wants to "download the entire internet" to train their AI systems.
They typically use 1000s of IP addresses on the same network, so you can't just block IPs. The same IP address will use loads of different user-agent strings, so you can't block by that.
Most of these come from China, and I've blocked every major network in China. Now I'm seeing the same pattern from various countries in south-east Asia and Africa.
If anyone has any suggestions for how to identify/block these crawlers, let me know!
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- photon flip
-
- Offline
- Junior Member
-
It will be interesting to see how much difference the noindex,nofollow change makes.
The AI training problem seems to be a growing issue.
You would think hosting providers will need to do more to address this from there end - eventually.
In the meantime I've resorted to blocking China all together - still the main perpetrator.
I call it the Trump method and likewise ultimately it won't solve it

MurrayJ
Fuller-Bennett Family Tree
Please Log in or Create an account to join the conversation.
- thomas52
-
Topic Author
- Offline
- Premium Member
-
- Western North Carolina
Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun
Please Log in or Create an account to join the conversation.
- fisharebest
-
- Offline
- Administrator
-
Looking at my logs, I now see lots of weird user-agent strings.
Since the Opera ones are quite short, I'll use them as an example - but I see the same for Chrome, Firefox, Safari, Edge, iPad, iPod, iPhone, etc.
In the last 60 seconds, I have seen these:
Opera/8.13.(Windows 98; yi-US) Presto/2.9.185 Version/10.00
Opera/8.19.(X11; Linux x86_64; quz-PE) Presto/2.9.173 Version/12.00
Opera/8.32.(Windows NT 5.01; wo-SN) Presto/2.9.188 Version/10.00
Opera/8.52.(X11; Linux x86_64; ug-CN) Presto/2.9.183 Version/12.00
Opera/8.63.(Windows NT 6.0; niu-NZ) Presto/2.9.187 Version/12.00
Opera/8.70.(Windows NT 5.01; csb-PL) Presto/2.9.170 Version/10.00
Opera/8.78.(Windows NT 5.01; tr-CY) Presto/2.9.161 Version/12.00
Opera/8.94.(Windows NT 6.0; wae-CH) Presto/2.9.175 Version/11.00
Opera/9.10.(X11; Linux i686; cs-CZ) Presto/2.9.161 Version/10.00
Opera/9.13.(Windows NT 6.1; an-ES) Presto/2.9.185 Version/11.00
Opera/9.20.(X11; Linux x86_64; ha-NG) Presto/2.9.176 Version/11.00
Opera/9.21.(X11; Linux x86_64; fi-FI) Presto/2.9.188 Version/12.00
Opera/9.45.(Windows 98; pl-PL) Presto/2.9.166 Version/11.00
Opera/9.55.(X11; Linux i686; gv-GB) Presto/2.9.171 Version/10.00
Opera/9.59.(X11; Linux x86_64; nso-ZA) Presto/2.9.181 Version/11.00
Opera/9.62.(X11; Linux x86_64; sat-IN) Presto/2.9.164 Version/11.00
Opera/9.66.(X11; Linux i686; hu-HU) Presto/2.9.170 Version/11.00
Opera/9.85.(X11; Linux i686; is-IS) Presto/2.9.175 Version/12.00
Opera/9.88.(Windows NT 11.0; lzh-TW) Presto/2.9.187 Version/12.00
The version numbers are bogus. For example, there was never version 8.32. Even if there was, it would have been released 20 years ago.
Also, Opera 8 used Presto/1.x (not 2.x).
The language codes (build language) are pretty random, and don't correlate to the location of the IP address.
i.e. I doubt there was ever a Manx (gv-GB) build of Opera, and the IP address is in Bangladesh.
So, clearly, these are randomly generated user-agent strings, and they are coming from many 1000s of different IP addresses around the world.
Some IP addresses are only used once per day.
They are often fetching obscure pages from webtrees, such as the Jalali calendar page.
They represent 95-99% of the traffic to my server - which is impacting the performance.
I presume many other sites are also affected.
Does anyone know of any resources that might help us distinguish these bogus user-agent strings from valid ones?
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.
- Sir Peter
-
- Offline
- Premium Member
-
- Posts: 517
Unfortunately blocking IPs, ASNs and/or user agents is a Sisyphean task and not really successful.
What about creating an optional webtrees module for rate limiting on application level with a configurable parameter for the max. number of requests per time frame? This module could send an http status code of 429, or a random status code from a list of 4xx or 5xx status codes to make webtrees unattractive for the bot.
Peter
Please Log in or Create an account to join the conversation.
- fisharebest
-
- Offline
- Administrator
-
So on my server, I have add blocks for these
* Chrome versions 1-99.
* Firefox versions 1-69
* Opera versions 8-9
* All "Trident" based browsers
These are all very old/unuspported browsers, so I'm 99% confident that I'm not blocking any legitimate users.
It has been 100% successful in blocking these crawlers.
Until I can be 100% certain that this does not affect any valid users/browsers, I am hesitant to add the same logic to webtrees.
Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net
Please Log in or Create an account to join the conversation.