Web based family history software

Question Robots.txt

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #1 by jheiler
Robots.txt was created by jheiler
Hi,
in the documentation I found the following:
"In webtrees 2.0, if you enable pretty URLs, it will be generated automatically."
Is it possible to manually enhance this file for example to disallow robots?

Kind regards

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Away
  • Administrator
  • Administrator
More
3 years 2 months ago #2 by fisharebest
Replied by fisharebest on topic Robots.txt
If the file /robots.txt does not exist, then webtrees will generate it dynamically.

To disable this, simply create your own /robots.txt

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #3 by jheiler
Replied by jheiler on topic Robots.txt
Thank You. Where can I find what should be filed in the robots.txt by default, f.e. disallow directories?

Please Log in or Create an account to join the conversation.

  • bertkoor
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
3 years 2 months ago #4 by bertkoor
Replied by bertkoor on topic Robots.txt
See developers.google.com/search/docs/advanced/robots/intro
At a first glance it looks like you better use the generated one.

What are your requirements exactly?

stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #5 by jheiler
Replied by jheiler on topic Robots.txt
My requirements are to complement the noindex/nofollow tags on the individual pages with something like
"User-agent:XXX
Disallow: /"
I am aware that this only works if the bot acknowledges the international rules.

Please Log in or Create an account to join the conversation.

  • bertkoor
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
3 years 2 months ago #6 by bertkoor
Replied by bertkoor on topic Robots.txt
I have a gut feeling: the robots you want to keep out don't respect what's specified in robots.txt

stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Away
  • Administrator
  • Administrator
More
3 years 2 months ago #7 by fisharebest
Replied by fisharebest on topic Robots.txt
1) delete any existing robots.txt
2) obtain the default by visiting your-site/robots.txt
3) copy/paste the text into your own robots.txt
4) edit the file to add your own rule.

-- or --

Create a module which replaces the template file - reources/views/robots.txt.phtml

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #8 by jheiler
Replied by jheiler on topic Robots.txt
Thank you so far.
I have a question with respect to the default rules in webtrees. Many of my pages - even the home page! - are marked "<meta name="robots" content="noindex">". The respective page will therefore not be considered even by serious bots like Google or Bing, in my opinion this is not an appropriate solution.
What is the ratio behind this?

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #9 by jheiler
Replied by jheiler on topic Robots.txt
I forgot to mention: Goggle Search Console says: 687 indexed, 7023 not indexed, incl. the home page.

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #10 by jheiler
Replied by jheiler on topic Robots.txt
After dpuble-checking I saw, that this could be a problem of the Justlight theme.

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #11 by jheiler
Replied by jheiler on topic Robots.txt
For the time being, I adjusted the meta.phtml file in the Justlight theme, it works.

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Away
  • Administrator
  • Administrator
More
3 years 2 months ago #12 by fisharebest
Replied by fisharebest on topic Robots.txt
> Many of my pages - even the home page! - are marked "<meta name="robots" content="noindex">"

Not true?

Visit dev.webtrees.net/demo-dev/tree/demo
View the source code...
Code:
<meta name="robots" content="index,follow">

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 2 months ago #13 by jheiler
Replied by jheiler on topic Robots.txt
It is a problem in the Justlight theme.

Please Log in or Create an account to join the conversation.

  • jheiler
  • Topic Author
  • Offline
  • Junior Member
  • Junior Member
More
3 years 1 month ago - 3 years 1 month ago #14 by jheiler
Replied by jheiler on topic Robots.txt
Hi,
I have another question.
My web pages are located at www.heiler-ahnen.de/webtrees , the automatically created robots.txt file can also be found there. But Google obviously searches for this file under www.heiler-ahnen.de and does not find it. Do I have to put a second robots.txt file there?
Kind regards
Last edit: 3 years 1 month ago by jheiler.

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Away
  • Administrator
  • Administrator
More
3 years 1 month ago #15 by fisharebest
Replied by fisharebest on topic Robots.txt
robots.txt files *only* work in the root folder.

You must take the robots.txt file created by webtrees
www.heiler-ahnen.de/webtrees/robots.txt
and then copy it to www.heiler-ahnen.de/robots.txt

(Tip - the robots.txt generated by webtrees contains these instructions!)

If you have other applications, then you must merge the robots.txt
with those for the rest of your site.

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
2 years 11 months ago - 2 years 11 months ago #16 by Supermarkert
Replied by Supermarkert on topic Robots.txt
I've finally enabled Pretty Urls for the first time, and the robots.txt file is not being generated at www.themarkerts.net (running 2.0.15). I just get a 404 page.

Any ideas?

Steve
I know just enough to be dangerous.
Last edit: 2 years 11 months ago by Supermarkert.

Please Log in or Create an account to join the conversation.

  • fisharebest
  • Away
  • Administrator
  • Administrator
More
2 years 11 months ago #17 by fisharebest
Replied by fisharebest on topic Robots.txt
> I just get a 404 page.

Works for me...

www.themarkerts.net/robots.txt

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

More
2 years 11 months ago #18 by Supermarkert
Replied by Supermarkert on topic Robots.txt
Well, that's infuriating. I tried for an hour and couldn't get it to load. Now it's working fine.
Thanks for the response. I guess I'm good now.

Steve
I know just enough to be dangerous.

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum
}