Before asking for help please read "Requesting Help and Suggestions" by clicking on that tab above here.
  • Page:
  • 1
  • 2

TOPIC:

Sitemap and Robots.txt 3 months 2 weeks ago #1

  • slateronline
  • slateronline's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 125
When I go to the sitemap module to check the auto-generated sitemap for my site I see a blank page - no error - but a blank page. If I look at the content with dev tools then I see a mixed http/https type error (but my config file now has the correct URL for https). I am sure in the past I saw a typical site xml outline of my site on this auto-generated page....

Also, I would like to use the auto generated robots.txt function but I'm not sure how to use it. Searching on here shows me specific URLs for other users sites and I am not sure which characters from the last part of that URL to include and transpose for mine. I have my install in the top of the tree for the sub-domain that I use within cpanel, so I am not sure where that leaves me in terms of automating the robots.txt generation...

Thanks
Neil
latest v2.1.16 | php v8.2 | mysql v7.4.28
Site: familytree.slateronline.com

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 2 weeks ago #2

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
> Also, I would like to use the auto generated robots.txt function but I'm not sure how to use it.

It is generated automatically (if you use pretty URLs).

You can see it here: familytree.slateronline.com/robots.txt

You'll see that it incluces a link to your sitemap.xml file - so search engines should find those automatically.

The sitemap page looks OK to me.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 2 weeks ago #3

  • slateronline
  • slateronline's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 125
Thanks Greg. I see the robots.txt now. But, the sitemap.xml is still just a "blank white page", but I'm missing something here as it's clearly got content there and I just don't know how to see it. In Chrome I can normally view the source, but in this case I don't get any right-click functionality. Maybe I need to clear cache, but it's weird.

Cheers
Neil
latest v2.1.16 | php v8.2 | mysql v7.4.28
Site: familytree.slateronline.com

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 2 weeks ago #4

  • slateronline
  • slateronline's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 125
When I view the "empty" sitemap page with dev tools I see this:

"sitemap.xml:2 Unsafe attempt to load URL familytree.slateronline.com/sitemap.xsl from frame with URL familytree.slateronline.com/sitemap.xml. Domains, protocols and ports must match." (but what isn't copied/pasted correctly above is the http/https mismatch). So, it's something related to me having put SSL on the server some time ago
Neil
latest v2.1.16 | php v8.2 | mysql v7.4.28
Site: familytree.slateronline.com

Please Log in or Create an account to join the conversation.

Last edit: by slateronline.

Sitemap and Robots.txt 3 months 2 weeks ago #5

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
You recently fixed the base_url on your site from http: to https: ???

The sitemap files are cached. The XSL (formatting) files will probably have http links in them.

Hence it is only a problem displaying them in your browser. Google/Bing/etc. will read them OK.

Wait a couple of weeks and they will be regenerated.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 1 week ago #6

  • slateronline
  • slateronline's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 125
Thanks Greg. Indeed I can now see the sitemap OK in my browser.

The next issue appears to be that Bing does not like that sitemap. I submitted it and it has scanned it (found the 9 subordinate sitemaps) but finds zero page URLs. I'm trying to figure out why, as when I click through the sitemaps I get to the base URLs. I have just now submitted one of the lower level sitemaps to Bing to see if it just can't handle a sitemap hierarchy, but that seems like basic functionality. Have you seen it before please?
Neil
latest v2.1.16 | php v8.2 | mysql v7.4.28
Site: familytree.slateronline.com

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 1 week ago #7

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
Your sitemap files look OK to me.

> Have you seen it before please?

No.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 1 week ago #8

  • slateronline
  • slateronline's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 125


Here is what Bing is stating about my current sitemap
Neil
latest v2.1.16 | php v8.2 | mysql v7.4.28
Site: familytree.slateronline.com
Attachments:

Please Log in or Create an account to join the conversation.

Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Sitemap and Robots.txt 3 months 1 week ago #9

  • Lars1963
  • Lars1963's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 181



Here is what Bing is stating about my current sitemap

I can confirm this for both of my sites too. And not only that Bing says the sitemap is "invalid", Google gives me a "invalid" too. So I disabled it for now.
Lars van Ravenzwaaij - see my family tree at www.ravenzwaaij.info

Please Log in or Create an account to join the conversation.

Last edit: by Lars1963.

Sitemap and Robots.txt 3 months 1 week ago #10

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
There are plenty of online sitemap validators. e.g. www.xml-sitemaps.com/validate-xml-sitemap.html

This one says that both your sitemaps are fine.

Do your site logs show the sitemap files being downloaded by google/bing sucessfully?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 3 months 1 week ago #11

  • Lars1963
  • Lars1963's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 181

There are plenty of online sitemap validators. e.g. www.xml-sitemaps.com/validate-xml-sitemap.htmlThis one says that both your sitemaps are fine.Do your site logs show the sitemap files being downloaded by google/bing sucessfully?
As suggested I used a validator. It shows my sitemaps are fine. So I disabled the sitemaps, deleted the sitemaps from Bing as well as from Google, re-enabled the sitemaps, and added the sitemaps to Bing and Google.As a result Bing is working as intended, but Google says the sitemap "could not be retrieved". Apart from that, Google does not index my site. It gives me a
Duplicate - not specified as canonical by the user
I don't know how to solve that?See the attached screenshots




Lars van Ravenzwaaij - see my family tree at www.ravenzwaaij.info
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by Lars1963.

Sitemap and Robots.txt 2 months 1 week ago #12

  • _the_mars_
  • _the_mars_'s Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 5

... but Google says the sitemap "could not be retrieved". Apart from that, Google does not index my site...

I also use Webtrees, and I like it.
But Google does not like the embedded stylesheet in the sitemap XML. So can you please remove them from thet sitemaps?
Then I do not need to patch the webtrees software. ;-)

Or if you like the stylesheet, please leave it out if the user-agent is Googlebot.

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 2 months 1 week ago #13

  • Lars1963
  • Lars1963's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 181

I also use Webtrees, and I like it.
But Google does not like the embedded stylesheet in the sitemap XML. So can you please remove them from thet sitemaps?

Just asking: is that the real problem why Google is not retrieving the sitemaps? Bing doesn't have a problem with that.

If so, can you open a issue on github for Greg?
Lars van Ravenzwaaij - see my family tree at www.ravenzwaaij.info

Please Log in or Create an account to join the conversation.

Last edit: by Lars1963.

Sitemap and Robots.txt 2 months 1 week ago #14

  • _the_mars_
  • _the_mars_'s Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 5

... is that the real problem why Google is not retrieving the sitemaps? ...

Yes, I have tried the separate sitemaps (not the sitemap index, but the ones mentioned in the sitemap index). Google could not index them first. But after I removed the stylesheet line in the source code (and cleaned the webtrees cache) Google finally understood the sitemaps.

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 2 months 1 week ago #15

  • Lars1963
  • Lars1963's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 181

... is that the real problem why Google is not retrieving the sitemaps? ...

Yes, I have tried the separate sitemaps (not the sitemap index, but the ones mentioned in the sitemap index). Google could not index them first. But after I removed the stylesheet line in the source code (and cleaned the webtrees cache) Google finally understood the sitemaps.

Which line(s) did you remove. There are more lines regarding stylesheet.
Lars van Ravenzwaaij - see my family tree at www.ravenzwaaij.info

Please Log in or Create an account to join the conversation.

Last edit: by Lars1963.

Sitemap and Robots.txt 2 months 1 week ago #16

  • Franz Frese
  • Franz Frese's Avatar
  • Offline
  • Premium Member
  • Premium Member
  • Posts: 828
I did not register a problem for google with my site freris.de/sitemap.xml, though there is some css.

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 2 months 1 week ago #17

  • _the_mars_
  • _the_mars_'s Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 5
Css is not the problem, but xml stylesheets, xslt.

In
./resources/views/modules/sitemap/sitemap-index-xml.phtml
I removed line 29
and in
./resources/views/modules/sitemap/sitemap-file-xml.phtml
I removed line 19.
I am running webtrees 2.1.16.

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 2 months 1 week ago #18

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
Half the web runs on WordPress. WordPress sitemap files contain stylesheets.
So, I don't believe that stylesheets on their own are a problem for Google.

I have several sites with stylesheets in sitemap files - and all worked OK last time I looked.

Stylesheets exist to convert XML to HTML for browsers, and Google wouldn't want to convert them.

So it is not obvious to me what the issue is with your site.

Maybe there is something else co-incidental?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Sitemap and Robots.txt 2 months 1 week ago #19

  • Lars1963
  • Lars1963's Avatar
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 181

Half the web runs on WordPress. WordPress sitemap files contain stylesheets.
So, I don't believe that stylesheets on their own are a problem for Google.

I have several sites with stylesheets in sitemap files - and all worked OK last time I looked.

Stylesheets exist to convert XML to HTML for browsers, and Google wouldn't want to convert them.

So it is not obvious to me what the issue is with your site.

Maybe there is something else co-incidental?

I can confirm the stylesheets are not the problem. Removing the lines as suggested doesn't solve my problem.

After adding the sitemap to Googles search console, it immediately says: "sitemap can't be read"

EDIT: Google seemst to try to get the sitemap though
ravenzwaaij.info anon-0-0-0-253.ip6.invalid - - [28/Mar/2023:17:48:09 +0200] "GET /admin HTTP/1.1" 200 4486 "https://www.ravenzwaaij.info/tree/ravenzwaaij/my-page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-0-0-0-253.ip6.invalid - - [28/Mar/2023:17:48:16 +0200] "GET /module/sitemap/Admin HTTP/1.1" 200 2010 "https://www.ravenzwaaij.info/admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-0-0-0-253.ip6.invalid - - [28/Mar/2023:17:48:50 +0200] "GET /sitemap.xml HTTP/1.1" 200 272 "https://search.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-0-0-0-253.ip6.invalid - - [28/Mar/2023:17:48:51 +0200] "GET /sitemap.xsl HTTP/1.1" 200 824 "https://www.ravenzwaaij.info/sitemap.xml" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-144-187-87-33.ip6.invalid - - [28/Mar/2023:17:54:13 +0200] "GET /tree/ravenzwaaij HTTP/1.1" 200 5631 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-144-187-87-33.ip6.invalid - - [28/Mar/2023:17:54:16 +0200] "GET /tree/ravenzwaaij/tree-page-block?block_id=63 HTTP/1.1" 200 1894 "https://www.ravenzwaaij.info/tree/ravenzwaaij" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-144-187-87-33.ip6.invalid - - [28/Mar/2023:17:54:16 +0200] "GET /tree/ravenzwaaij/tree-page-block?block_id=41 HTTP/1.1" 200 1683 "https://www.ravenzwaaij.info/tree/ravenzwaaij" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
ravenzwaaij.info anon-144-187-87-33.ip6.invalid - - [28/Mar/2023:17:54:40 +0200] "GET /sitemap.xml HTTP/1.1" 200 237 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54"
Lars van Ravenzwaaij - see my family tree at www.ravenzwaaij.info

Please Log in or Create an account to join the conversation.

Last edit: by Lars1963.

Sitemap and Robots.txt 2 months 1 week ago #20

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17129
In my logs, google uses its own crawler "Googlebot" to fetch the sitemap.xml files
webtrees/log/dev.webtrees.net-access.log:66.249.76.104 - - [28/Mar/2023:13:05:52 +0000] "GET /demo-dev/sitemap.xml HTTP/1.1" 200 282 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

> EDIT: Google seemst to try to get the sitemap though

Are you sure? Your logs do not show "Googlebot".
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2
Powered by Kunena Forum