Web based family history software

file Question Too many CPU Seconds

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 1 week ago #1 by thomas52
Too many CPU Seconds was created by thomas52
SiteGround tells me: We would like to warn you that your  adkins.ws  hosting plan has exceeded 80% of its allowed CPU seconds quota. Please note that once you hit 100% of the allowed CPU seconds, your web service will be limited till the end of the current calendar month and your site may become inaccessible.
What is this?  How did I get here?  And how do I fix it?

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

  • norwegian_sardines
  • norwegian_sardines's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
More
1 month 1 week ago - 1 month 1 week ago #2 by norwegian_sardines
Replied by norwegian_sardines on topic Too many CPU Seconds
Is this a new or newer host?

Base on my limited knowledge it looks like your host limits the amount of work (cpu cycles) you have purchased.  

Are you making a lot of updates?  Did you just load a new GEDCOM. Is the internet mapping your site for search robots?

Have you talked to your host about what this means and how they can make it better for you?

Remember, webtrees uses a database and requires SQL queries to find data!  Is this host more geared to supporting static web pages and do they have other plans that support more data intense websites!

Ken
Last edit: 1 month 1 week ago by norwegian_sardines.

Please Log in or Create an account to join the conversation.

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 1 week ago #3 by thomas52
Replied by thomas52 on topic Too many CPU Seconds
I have been with this host several years & upgraded to 2.2 about 6 weeks ago. I got several similar warnings about the time I was upgrading, but nothing since until this. I wouldn't normally worry about it, but 80% after only 8 days? And talking to these people is near impossible - they give me non-answers or refer me to their 'help center.'

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 1 week ago #4 by thomas52
Replied by thomas52 on topic Too many CPU Seconds
Had problems with heavy traffic a few weeks ago, and modified my config.ini.php to include
block_asn="AS45899=VNPTCorp"
... but checking recent traffic, I AM STILL GETING MANY HITS FROM THESE PEOPLE - THE BLOCK IS NOT WORKING!

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

More
1 month 1 week ago #5 by kiwi
Replied by kiwi on topic Too many CPU Seconds
Are you sure all your log entries are for the that single ASN?

You need to check every IP address from your recent logs to find all the ASNs to block, then list all of them in your config file.

Then repeat regularly , daily / weekly, until they’re all stopped.

Please Log in or Create an account to join the conversation.

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 1 week ago #6 by thomas52
Replied by thomas52 on topic Too many CPU Seconds
No, not all, but many. I found an option to block countries, so I have blocked China, Vietnam, Russia & Brazil.
BUT, why was this not working? (MANY Vietnam)
block_asn="AS45899"

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

More
1 month 1 week ago #7 by kiwi
Replied by kiwi on topic Too many CPU Seconds

No, not all, but many. I found an option to block countries, so I have blocked China, Vietnam, Russia & Brazil.
Was this done through some feature your web host provides? If so you should talk to them about why it’s not working. If they can get that working, you won’t need the webtrees ASN blocking. 
If not your host, how did you do it? 

Please Log in or Create an account to join the conversation.

More
1 month 1 week ago #8 by photon flip
Replied by photon flip on topic Too many CPU Seconds
I've been having similar problems - 685000 hits at times. I blocked many ASN numbers. I thought about blocking countries particularly China but I lately I see random countries.
The majority of hits seem to be calendar requests - every possible date permutation. These are set by default as no-index but that is ignored. Is there a way to block requests to calendar urls?  

Please Log in or Create an account to join the conversation.

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 1 week ago #9 by thomas52
Replied by thomas52 on topic Too many CPU Seconds
Murray: I'm getting much the same, but the vast majority from "problem countries."
Nigel: When the block_asn added to the config.ini,php failed to work, I looked for another solution. My host has a tool to block IP ranges or countries. China, Vietnam & Russia are constant problems now blocked, so we will see.

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
1 month 1 week ago #10 by bertkoor
Replied by bertkoor on topic Too many CPU Seconds

Is there a way to block requests to calendar urls?
Control Panel -> Modules-Genealogy-Menus -> Calendar -> show to members only

stamboom.BertKoor.nl runs on webtrees v2.2.1

Please Log in or Create an account to join the conversation.

More
1 month 1 week ago #11 by photon flip
Replied by photon flip on topic Too many CPU Seconds
Thanks for the suggestion but restricting or disabling the calendar menu only stops access to the menu on the front end by real users. The calender functionality and hence it's URL is used in several other ways ie. On a facts and events tab, clicking on a date brings up the calendar and that can also be accessed directly by URL by bad robots.

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
1 month 1 week ago #12 by bertkoor
Replied by bertkoor on topic Too many CPU Seconds
If you don't use it yourself (I never do) then you can disable the complete Calendar module.

stamboom.BertKoor.nl runs on webtrees v2.2.1

Please Log in or Create an account to join the conversation.

More
1 month 1 week ago #13 by photon flip
Replied by photon flip on topic Too many CPU Seconds
I don't use it either and have it disabled but the calendar module only controls the menu, the calendar is a core function for date display and is accessible buy URL.
I don't know if the 600,000 hits on calendar are malicious or just poorly configured bots but it's crazy trying to play wackamole with blocks.

I should note I also get a lot of bots targeting the contact url.

I'm going to talk to my hosting provider Site ground, they advertise that they try to stop DDS style attacks but it's obviously not working.

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
1 month 1 week ago - 1 month 1 week ago #14 by bertkoor
Replied by bertkoor on topic Too many CPU Seconds

the calendar module only controls the menu, the calendar is a core function for date display and is accessible buy URL.
Ah yes, now I see...

Maybe easiest is to put something like this in resources/views/calendar-page.phtml
Code:
<?php if (Auth::check()) : ?> <table ... (the whole page that was there) <?= view('modals/ajax') ?> <?php else : ?> <?= view('components/alert-warning-dismissible', ['alert' => 'Sorry, nothing to see here. You must be logged in.']) ?> <?php endif ?>

I don't know if the 600,000 hits on calendar are malicious or just poorly configured bots
These are surely bots that ignore all "no trespassing" signs and just scrape any content they can find. Not really malicious, just plain dumb. I'd guess that they want web content to train their new LargeLanguageModel on. The bots don't realise that the content they get is rather useless for that.

they advertise that they try to stop DDS style attacks but it's obviously not working.
Simply stated, if your site is reachable for everybody then there is no 'denial of service' caused by the traffic and hence it's working.
You don't know how much bot traffic is actually stopped by them already.

stamboom.BertKoor.nl runs on webtrees v2.2.1
Last edit: 1 month 1 week ago by bertkoor.

Please Log in or Create an account to join the conversation.

More
1 month 5 days ago #15 by fisharebest
Replied by fisharebest on topic Too many CPU Seconds
> The majority of hits seem to be calendar requests - every possible date permutation.

I spotted this a few days ago.

For the calendar page, we set "meta robots=noindex".

This should be "meta index=noindex,nofollow".

The result is that even legitimate crawlers are trying to visit every date in history...

I'll push out a new release with this fix in the next day or two.

But I am also having problems with crawlers - it seems that everybody in the world wants to "download the entire internet" to train their AI systems.

They typically use 1000s of IP addresses on the same network, so you can't just block IPs. The same IP address will use loads of different user-agent strings, so you can't block by that.

Most of these come from China, and I've blocked every major network in China. Now I'm seeing the same pattern from various countries in south-east Asia and Africa.

If anyone has any suggestions for how to identify/block these crawlers, let me know!

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
1 month 5 days ago #16 by photon flip
Replied by photon flip on topic Too many CPU Seconds
Thank you Greg for clarification on this problem or problems as turns out to be.
It will be interesting to see how much difference the noindex,nofollow change makes.

The AI training problem seems to be a growing issue.
You would think hosting providers will need to do more to address this from there end - eventually.

In the meantime I've resorted to blocking China all together - still the main perpetrator.
I call it the Trump method and likewise ultimately it won't solve it :-) 

Please Log in or Create an account to join the conversation.

  • thomas52
  • thomas52's Avatar Topic Author
  • Offline
  • Premium Member
  • Premium Member
  • Western North Carolina
More
1 month 5 days ago #17 by thomas52
Replied by thomas52 on topic Too many CPU Seconds
You should block Viet Nam as well, and seriously consider blocking Russia. (And something is going on in Brazil.)

Research is what I’m doing when I don’t know what I’m doing – Wernher von Braun

Please Log in or Create an account to join the conversation.

More
1 month 3 days ago #18 by fisharebest
Replied by fisharebest on topic Too many CPU Seconds
Blocking China has helped about 50%.

Looking at my logs, I now see lots of weird user-agent strings.  

Since the Opera ones are quite short, I'll use them as an example - but I see the same for Chrome, Firefox, Safari, Edge, iPad, iPod, iPhone, etc.

In the last 60 seconds, I have seen these:

Opera/8.13.(Windows 98; yi-US) Presto/2.9.185 Version/10.00
Opera/8.19.(X11; Linux x86_64; quz-PE) Presto/2.9.173 Version/12.00
Opera/8.32.(Windows NT 5.01; wo-SN) Presto/2.9.188 Version/10.00
Opera/8.52.(X11; Linux x86_64; ug-CN) Presto/2.9.183 Version/12.00
Opera/8.63.(Windows NT 6.0; niu-NZ) Presto/2.9.187 Version/12.00
Opera/8.70.(Windows NT 5.01; csb-PL) Presto/2.9.170 Version/10.00
Opera/8.78.(Windows NT 5.01; tr-CY) Presto/2.9.161 Version/12.00
Opera/8.94.(Windows NT 6.0; wae-CH) Presto/2.9.175 Version/11.00
Opera/9.10.(X11; Linux i686; cs-CZ) Presto/2.9.161 Version/10.00
Opera/9.13.(Windows NT 6.1; an-ES) Presto/2.9.185 Version/11.00
Opera/9.20.(X11; Linux x86_64; ha-NG) Presto/2.9.176 Version/11.00
Opera/9.21.(X11; Linux x86_64; fi-FI) Presto/2.9.188 Version/12.00
Opera/9.45.(Windows 98; pl-PL) Presto/2.9.166 Version/11.00
Opera/9.55.(X11; Linux i686; gv-GB) Presto/2.9.171 Version/10.00
Opera/9.59.(X11; Linux x86_64; nso-ZA) Presto/2.9.181 Version/11.00
Opera/9.62.(X11; Linux x86_64; sat-IN) Presto/2.9.164 Version/11.00
Opera/9.66.(X11; Linux i686; hu-HU) Presto/2.9.170 Version/11.00
Opera/9.85.(X11; Linux i686; is-IS) Presto/2.9.175 Version/12.00
Opera/9.88.(Windows NT 11.0; lzh-TW) Presto/2.9.187 Version/12.00

The version numbers are bogus.  For example, there was never version 8.32.  Even if there was, it would have been released 20 years ago.

Also, Opera 8 used Presto/1.x (not 2.x).

The language codes (build language) are pretty random, and don't correlate to the location of the IP address.
i.e. I doubt there was ever a Manx (gv-GB) build of Opera, and the IP address is in Bangladesh.

So, clearly, these are randomly generated user-agent strings, and they are coming from many 1000s of different IP addresses around the world.
Some IP addresses are only used once per day.

They are often fetching obscure pages from webtrees, such as the Jalali calendar page.

They represent 95-99% of the traffic to my server - which is impacting the performance.
I presume many other sites are also affected.

Does anyone know of any resources that might help us distinguish these bogus user-agent strings from valid ones?
 

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
1 month 3 days ago #19 by Sir Peter
Replied by Sir Peter on topic Too many CPU Seconds
Maybe search for "user agents" on GitHub. There seem to be lists of valid agents and lists of crawlers and even php libraries which deal with this issue. The only repos  I had a look at are monperrus/crawler-user-agents and jaybizzle/crawler-detect.

Unfortunately blocking IPs, ASNs and/or user agents is a Sisyphean task and not really successful.

What about creating an optional webtrees module for rate limiting on application level with a configurable parameter for the max. number of requests per time frame? This module could send an http status code of 429, or a random status code from a list of 4xx or 5xx status codes to make webtrees unattractive for the bot.

Peter

Please Log in or Create an account to join the conversation.

More
1 month 2 days ago #20 by fisharebest
Replied by fisharebest on topic Too many CPU Seconds
The crawlers were using random version numbers - but within a fixed range.

So on my server, I have add blocks for these

* Chrome versions 1-99.
* Firefox versions 1-69
* Opera versions 8-9
* All "Trident" based browsers

These are all very old/unuspported browsers, so I'm 99% confident that I'm not blocking any legitimate users.

It has been 100% successful in blocking these crawlers.

Until I can be 100% certain that this does not affect any valid users/browsers, I am hesitant to add the same logic to webtrees.

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum