Before asking for help please read "Requesting Help and Suggestions" by clicking on that tab above here.
  • Page:
  • 1
  • 2

TOPIC:

google indexing 3 weeks 21 hours ago #21

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Not sure how to do a) :(

Tried b) without completing a) and site wouldn't load.

Cloudflare allows me to turn the proxy off on the webtrees site. Tried that, no joy.

Webtrees running in a docker on Unraid. All webtrees data forwarded to NIGIX proxy server, which forwards to to the webtrees docker on the 192.168.1.0/24. Unraid recommends putting all the dockers on their own private network, so that's where the 172. network comes from. Unraid does the forward from 192. to 172.

I can put the webtrees docker on the 192., but reading your response, that doesn't look like it will make any difference.

Can anyone help with the cloudflare config, or suggest another alternative?

Please Log in or Create an account to join the conversation.

google indexing 3 weeks 21 hours ago #22

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Not sure how to do a) :(

Tried b) without completing a) and site wouldn't load.

Cloudflare allows me to turn the proxy off on the webtrees site. Tried that, no joy.

Webtrees running in a docker on Unraid. All webtrees data forwarded to NIGIX proxy server, which forwards to to the webtrees docker on the 192.168.1.0/24. Unraid recommends putting all the dockers on their own private network, so that's where the 172. network comes from. Unraid does the forward from 192. to 172.

I can put the webtrees docker on the 192., but reading your response, that doesn't look like it will make any difference.

Can anyone help with the cloudflare config, or suggest another alternative?

Please Log in or Create an account to join the conversation.

google indexing 3 weeks 21 hours ago #23

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17010
So, you are using cloudflare *and* another proxy.

That's two things trying to hide the visitors IP address!

Cloudflare sets the "real IP" in an HTTP header "cf-connecting-ip".

> webtrees running in a docker on Unraid.

I don't know what unraid is - and am work so don't have time to google it.

But you'll presumably have a webserver that listens for external connections, and forwards them to docker.

I would have presumed this webserver will forward all request headers to docker - so it should see this header.

Do you see it in your "php info" page?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

google indexing 3 weeks 19 hours ago #24

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Not sure how to do a) :(

Tried b) without completing a) and site wouldn't load.

Cloudflare allows me to turn the proxy off on the webtrees site. Tried that, no joy.

Webtrees running in a docker on Unraid. All webtrees data forwarded to NIGIX proxy server, which forwards to to the webtrees docker on the 192.168.1.0/24. Unraid recommends putting all the dockers on their own private network, so that's where the 172. network comes from. Unraid does the forward from 192. to 172.

I can put the webtrees docker on the 192., but reading your response, that doesn't look like it will make any difference.

Can anyone help with the cloudflare config, or suggest another alternative?

Please Log in or Create an account to join the conversation.

google indexing 3 weeks 17 hours ago #25

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Sorry for the multiple postings. Didn't see the 2 continuation at the bottom of the thread

Please Log in or Create an account to join the conversation.

google indexing 3 weeks 17 hours ago #26

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
So cloudflare inserts its own IP for my public IP address, so that's what the user sees. Everything is still forwarded to my router on ports 80 or 443. The router forwards everything to the NGINX proxy server which extracts which information to determine the url and forwards the incoming data to the IP address associated with the webtrees docker in this case. Unraid suggests running dockers on their own private network, in this case the 172. net, keeping them isolated from the local LAN - NGINX cut attached. Also attached the a snippet of the docker iIP assignments. It's a great system. Just need to keep 80 and 443 open.

I tried assigning webtrees to the 192. lan network, but it wouldn't load. Set it back to dockernet, the private docker network, but still won't load. Geting bad gateway, Error code 502. Driving me crazy :) I've been here before, and it required a complete new install. Very finicky.

I
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by comet48.

google indexing 2 weeks 6 days ago #27

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17010
The question is one of headers - not ports or forwarding.

You said you use cloudflare.

Cloudflare sets an HTTP header.

I asked if you could see this header in your control panel.

> The router forwards everything to the NGINX proxy server which extracts which information to determine the url and forwards the incoming data to the IP address associated with the webtrees docker in this case.

So, if you cannot see the cloudflare header, you probably need to look at this NGINX config
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 4 days ago #28

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Here is some info from cloudflare.

developers.cloudflare.com/support/troubl...riginal-visitor-ips/

"When your website traffic is routed through the Cloudflare network, we act as a reverse proxy. This allows Cloudflare to speed up page load time by routing packets more efficiently and caching static resources (images, JavaScript, CSS, etc.). As a result, when responding to requests and logging them, your origin server returns a Cloudflare IP address.

For example, if you install applications that depend on the incoming IP address of the original visitor, a Cloudflare IP address is logged by default. The original visitor IP address appears in an appended HTTP header called CF-Connecting-IP. By following our web server instructions, you can log the original visitor IP address at your origin server. If this HTTP header is not available when requests reach your origin server, check your Transform Rules and Managed Transforms configuration.

The diagram below illustrates the different ways that IP addresses are handled with and without Cloudflare.

So two questions;
1. Do you access the header called CF-Connecting-IP?
2. What does "trusted_headers="cf-connecting-ip"do?

Thanks for your patience

My 2.x webtrees is not up. Going to reinstall once I understand precisely what I need to do:)

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 4 days ago #29

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
  • Posts: 2939
According the instructions its simple
webtrees.net/install/cloudflare/

when using this type of service, all requests to your site will come from Cloudflare, and not from the remote user. Hence you will only see Cloudflare’s IP address in your logs - not the IP address of the actual user.

Cloudflare provides details of the remote user in an HTTP header, and you must tell webtrees that it is OK to use/trust this header.

Once you have finished the setup process, you will have a file data/config.ini.php. Add the following line to that file :

trusted_headers="cf-connecting-ip"


Without that, the header is stripped. So webtrees won't know about it and it doesn't work. It then cannot distinguish bots from normal users, for example.

Just do it. The alternative is to not use CloudFlare.
stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

Last edit: by bertkoor.

google indexing 2 weeks 4 days ago #30

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17010
> 1. Do you access the header called CF-Connecting-IP?

webtrees will look in this header for the IP address - but only if you tell it to.
Use the trusted_headers setting for this

I asked if you could see this header in the control panel.
You shouldn't trust it unless (a) you can see it and (b) you expect to see it.

> 2. What does "trusted_headers="cf-connecting-ip"do?

Any visitor to your site can set this header to any value they choose.

So, "trusted" means that you trust the header to be set correctly.

i.e. you trust cloudflare to have ignored any header with this name from the original request, and to have added this header with the IP address of the connecting computer.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 4 days ago #31

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Thank for the clear explanation. I setup a new install at webtrees.haleycentral.com. It works fine with firefox and apple browser. Really screwey with chrome. Some stuff displays, other just a spinning circle.

I included some header and apache environment attachments.

config.php.ini as follows;
; <?php return; ?> DO NOT DELETE THIS LINE
dbtype="mysql"
dbhost="192.168.1.92"
dbport="3306"
dbuser="xxxxxx"
dbpass="xxxxxxx"
dbname="gedcom"
tblpfx="wt_"
base_url="webtrees.haleycentral.com"
rewrite_urls="0"
trusted_headers="cf-connecting-ip"

Sitemap on, but not populating;
webtrees.haleycentral.com/index.php?route=%2Fsitemap.xml

"Almost there" I hope.
Thanks again
Attachments:

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 4 days ago #32

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17010
Sitemap files are cached for (7? 10?) days.

I guess you tried to view the sitemap before enabling it for any trees?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 1 day ago #33

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Now this site, genealogy.haleycentral.com, seems to be up and running, I'm back to the original issue, bot indexing. A sample log snippet is below. Does this look OK? Again, cloudflare but this time with
trusted_headers="cf-connecting-ip"
in the .ini file. One thing I see is that the bing bot completely ignores the tree variable.


webtrees:80 127.0.0.1 - - [09/Mar/2023:14:38:49 -0800] "GET / HTTP/1.1" 302 669 "-" "curl/7.74.0"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:15 -0800] "GET /family.php?famid=F7855&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/individual.php?pid=I22958&ged=tree1" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 127.0.0.1 - - [09/Mar/2023:14:39:20 -0800] "GET / HTTP/1.1" 302 669 "-" "curl/7.74.0"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:21 -0800] "GET /individual.php?pid=I1073&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/individual.php?pid=I1078&ged=tree1" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:28 -0800] "GET /family.php?famid=F6151&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/individual.ph...jax&module=relatives" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:32 -0800] "GET /individual.php?pid=I102&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/individual.php?pid=I5801&ged=tree1" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 127.0.0.1 - - [09/Mar/2023:14:39:51 -0800] "GET / HTTP/1.1" 302 669 "-" "curl/7.74.0"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:54 -0800] "GET /individual.php?pid=i319757 HTTP/1.1" 400 59429 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:55 -0800] "GET /family.php?famid=F6004&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/family.php?famid=F6006&ged=tree1" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:55 -0800] "GET /individual.php?pid=i320517 HTTP/1.1" 400 59429 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:56 -0800] "GET /family.php?famid=f504094195 HTTP/1.1" 400 59427 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:56 -0800] "GET /individual.php?pid=i317605 HTTP/1.1" 400 59429 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:57 -0800] "GET /individual.php?pid=i322313 HTTP/1.1" 400 59429 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
webtrees:80 172.18.0.1 - - [09/Mar/2023:14:39:57 -0800] "GET /family.php?famid=F6063&ged=tree1 HTTP/1.1" 406 487 "genealogy.haleycentral.com/individual.php?pid=I8976&ged=tree1" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"
webtrees:80 172.18.0.1 - -
Attachments:

Please Log in or Create an account to join the conversation.

Last edit: by comet48.

google indexing 2 weeks 1 day ago #34

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
  • Posts: 2939
Looking at the urls, these are in the old format and the indi/fam ID's start with a lowercase i/f. Petalbot uses uppercase.
So it could be that bing is fetching old known url's it remembered from the past, or referenced from elsewhere.

Do you have "pretty" urls enabled?

Can't see anything currently, every request to your site ends with

Origin is unreachable
Error code 523

stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 21 hours ago #35

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Pretty urls on. We had major storms here in San Francisco area, so just back up.

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 21 hours ago #36

  • Sir Peter
  • Sir Peter's Avatar
  • Offline
  • Senior Member
  • Senior Member
  • Posts: 406

Looking at the urls, these are in the old format and the indi/fam ID's start with a lowercase i/f. Petalbot uses uppercase.
So it could be that bing is fetching old known url's it remembered from the past, or referenced from elsewhere.

There's a module called "Legacy URLs" to redirect old URLs from webtrees version 1. Maybe activate it and test manually whether these old urls are being forwarded correctly?
Peter

Please Log in or Create an account to join the conversation.

google indexing 2 weeks 20 hours ago #37

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 17010
The "406" status indicates that PetalBot is being blocked by webtrees - by the "BadBotBlocker" module.

> A sample log snippet is below. Does this look OK?

I really want to see the entry where google fetches the sitemap.xml file.

Does it also get a 406 response?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

google indexing 1 week 1 day ago #38

  • comet48
  • comet48's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 67
Saw this in the log:
webtrees:80 172.18.0.1 - - [16/Mar/2023:19:59:31 -0700] "GET /robots.txt HTTP/1.1" 200 1074 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2
Powered by Kunena Forum