Before asking for help please read "Requesting Help and Suggestions" by clicking on that tab above here.
  • Page:
  • 1

TOPIC:

Google Search Console - Indexing my webtrees 3 weeks 6 days ago #1

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Hello,
I'm hoping someone can help me sort out what I'm doing wrong. I've been trying to use Google search console to index my webtrees site, but I'm not having much luck.
Google reports that 64 pages have been indexed and 1.78K have not been indexed.
The main reason given is "excluded by the 'noindex' tag". (1455 pages excluded)
I have no idea how to stop the noindex tag from being generated or which pages should and which shouldn't be indexed.
Website is:
Rob and Lynda's Family Tree

Any help appreciated
Thanks in advance
Rob M
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 3 weeks 6 days ago #2

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
> I've been trying to use Google search console to index my webtrees site, but I'm not having much luck.

Without telling us what you did, or details of what happened, it is impossible to guess.

Assuming you submitted your sitemap, then google's webmaster tools should tell you if there are any issues.

> I have no idea how to stop the noindex tag from being generated or which pages should and which shouldn't be indexed.

webtrees makes this decision for you.

Pages for the home page, indivdiuals, families, sources, and other records are set with "index".
For example, the page for your default individual has: <meta name="robots" content="index,follow">

Other pages (charts, calendar, reports, etc) are set with "noindex".
For example, your calendar page has: <meta name="noindex">

I also note that your calendar page shows an error message which suggests that you are using PHP 8 with webtrees 2.0.
If that's the case, you need to either upgrade to webtrees 2.1, or downgrade to PHP 7.4
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 3 weeks 4 days ago #3

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Thanks for the heads up on the calendar page Greg. I will upgrade webtrees.

Unfortunately that doesn't solve the indexing issue. I've just discovered that there seems to be a problem with my robots.txt file.
I can browse to the file and read the contents of the sitemap generated with webtrees
genes.robandlynda.net/robots.txt
However when I submit the website "genes.robandlynda.net" for a URL inspection on Google Search Console the error "Failed: robots.txt unreachable" is returned.
Any ideas ?
Thanks again
Rob M
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Last edit: by RobM.

Google Search Console - Indexing my webtrees 3 weeks 3 days ago #4

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
> However when I submit the website "genes.robandlynda.net" for a URL inspection on Google Search Console the error "Failed: robots.txt unreachable" is returned.

Check your webserver logs? Did google try to fetch the file at that time?

Could there have been network/connectivity issues at the time?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 3 weeks 2 days ago #5

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Not a connectivity issue Greg. When I try to submit "genes.robandlynda.net" or "genes.robandlynda.net/famtree" for a live URL inspection on Google Search Console nothing shows in either the apache access.log or error.log and both return "robots.txt unreachable". If I submit any individuals page there is no problem "URL is available to Google" is returned and I can submit it for indexing.
This has been doing my head in for months now.
Regards
RobM
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 3 weeks 1 hour ago #6

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
So, google claims it is trying to fetch a URL from your site - but does not get a response.

But your webserver says no request was received.

It is difficult/impossible to guess - without having direct control of the server, google account, etc. Sorry.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 6 days ago #7

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Yes, That about sums it up.
Thanks for trying Greg
Might be something with how I've configured the virtual hosts so I'll look there
Cheers Rob
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 6 days ago #8

  • ddrury
  • ddrury's Avatar
  • Offline
  • Senior Member
  • Senior Member
  • Posts: 374
Rob,

The robots.txt you link to above looks remarkably like one generated by webtrees1.7. The files listed there don't exist (unless pretty URLs do something clever). I think you need to remove the physical file robots.txt from the root directory and enable and configure the sitemaps module. Greg (@fisharebest) can you confirm I'm not talking nonsense here!!
--
Dave

Local: Win 11 Pro, WSL2/Ubuntu20.04.4, Apache 2.4.51, PHP 7.4.26/8.1.7, MySQL 8.0.27
Production: Litespeed 8.0.1, PHP 8.1.9, MySQL 8.0.26

Please Log in or Create an account to join the conversation.

Last edit: by ddrury. Reason: Add info
Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Google Search Console - Indexing my webtrees 2 weeks 5 days ago #9

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475

Greg (@fisharebest) can you confirm I'm not talking nonsense here!!

You're right - this is a webtrees 1.x robots.txt file.
But it shouldn't cause a problem, as it only blocks URLs that no longer exist.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 5 days ago #10

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Thanks Dave.
I can't even remember where that robots.txt came from and I didn't even check it to see if the paths were correct.
As for the sitemap, I was hoping that would resolve itself if I managed to fix robots.txt. I enabled the sitemap module and submitted the site map to Google last August with the same result as robots.txt "Sitemap could not be read"

Rob
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 5 days ago #11

  • ddrury
  • ddrury's Avatar
  • Offline
  • Senior Member
  • Senior Member
  • Posts: 374
So what happens if you click on the Sitmaps module - configuration page - URL line and follow down the hierarchy, do you end up seeing the web page pointed to by the URL? If you do, then you need advice from someone more skilled in server administration than me.
--
Dave

Local: Win 11 Pro, WSL2/Ubuntu20.04.4, Apache 2.4.51, PHP 7.4.26/8.1.7, MySQL 8.0.27
Production: Litespeed 8.0.1, PHP 8.1.9, MySQL 8.0.26

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #12

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Dave,
If you are referring to the URL generated in Control panel/Modules/Sitemaps it returns a blank page

URL — genes.robandlynda.net/famtree/index.php?...amtree%2Fsitemap.xml

Rob
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #13

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
That URL gave a valid sitemap when I initially looked at it.

I just tried again, and it still loads OK.

Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by fisharebest.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #14

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Thanks Greg
Well I had better start looking at firewalls.
Webtrees is running on a Raspberry Pi

Rob
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #15

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
> webtrees is running on a Raspberry Pi

The sitemap files can be relatively slow to generate.

It is possible that you are getting timeouts.

Results are cached, so it may be that you started a request - which timed out for you, but continued to run in the background so that I could see it later.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #16

  • RobM
  • RobM's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 8
Wouldn't that mean I should be able to go back and see it also ?
genes.robandlynda.net/famtree
Webtrees Ver. 2.0.19

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 2 weeks 4 days ago #17

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
If that's the case - then yes.

Or your server may simply be too busy or underpowered.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 6 days 20 hours ago #18

Maybe this helps:The coverage of my page wbt.warius.info in the Google index was originally under 1.7 at 90% and now dropped under 2.1 to under 20%. The cause was many old 1.7 links that were redirected with 302 and crawled again and again.I therefore frequently changed the robot.txt and blocked the sitemap.xml with"Disallow: /" for 3 days in August. Since then, Google hasn't read the sitemap.xml. Apparently the sitemap is marked and the resubmission is acknowledged by Google but not carried out (no entry in the web server log). So it seems to be a bug at Google.Workaround: Submit all sub-sidemaps manually. Since then, Google has been crawling again.
Frank
https;//wbt.warius.info on Windows Server 2022, IIS10, PHP 8.1.12 MYSQL latest
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by Warius.

Google Search Console - Indexing my webtrees 6 days 20 hours ago #19

  • fisharebest
  • fisharebest's Avatar
  • Away
  • Administrator
  • Administrator
  • Posts: 16475
> The cause was many old 1.7 links that were redirected with 302 and crawled again and again.

webtrees generates 301 (moved permanently) redirects.

github.com/fisharebest/webtrees/blob/7fa...ndividualPhp.php#L65

Are you sure you get 302?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Google Search Console - Indexing my webtrees 6 days 6 hours ago #20

Yes, because 302 is the default in Illuminate\Contracts\Routing interface ResponseFactory

2 month ago after tracing I did two changes:

\app\Http\Middleware\NoRouteFound.php:
23 + use Fig\Http\Message\StatusCodeInterface;
57 - return redirect(route(HomePage::class));
58 + return redirect(route(HomePage::class),StatusCodeInterface::STATUS_GONE); or STATUS_NOTFOUND

\app\Helpers\functions.php:
98 - function redirect(string $url, int $code = StatusCodeInterface::STATUS_FOUND): ResponseInterface
98 + function redirect(string $url, int $code = StatusCodeInterface::STATUS_MOVED_PERMANENTLY ): ResponseInterface

this works and reduces Google requests

another issue is that HomePage::class directs to "/" and not to the default tree "/treee/Warius" which causes another redirect
Frank
https;//wbt.warius.info on Windows Server 2022, IIS10, PHP 8.1.12 MYSQL latest

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
Powered by Kunena Forum