Before asking for help please read "Requesting Help and Suggestions" by clicking on that tab above here.
  • Page:
  • 1

TOPIC:

Robots.txt 4 months 1 day ago #1

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
Hi,
in the documentation I found the following:
"In webtrees 2.0, if you enable pretty URLs, it will be generated automatically."
Is it possible to manually enhance this file for example to disallow robots?

Kind regards

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 1 day ago #2

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 14511
If the file /robots.txt does not exist, then webtrees will generate it dynamically.

To disable this, simply create your own /robots.txt
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 1 day ago #3

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
Thank You. Where can I find what should be filed in the robots.txt by default, f.e. disallow directories?

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 1 day ago #4

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
  • Posts: 2273
See developers.google.com/search/docs/advanced/robots/intro
At a first glance it looks like you better use the generated one.

What are your requirements exactly?
stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 1 day ago #5

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
My requirements are to complement the noindex/nofollow tags on the individual pages with something like
"User-agent:XXX
Disallow: /"
I am aware that this only works if the bot acknowledges the international rules.

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 23 hours ago #6

  • bertkoor
  • bertkoor's Avatar
  • Offline
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
  • Posts: 2273
I have a gut feeling: the robots you want to keep out don't respect what's specified in robots.txt
stamboom.BertKoor.nl runs on webtrees v1.7.13

Please Log in or Create an account to join the conversation.

Robots.txt 4 months 22 hours ago #7

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 14511
1) delete any existing robots.txt
2) obtain the default by visiting your-site/robots.txt
3) copy/paste the text into your own robots.txt
4) edit the file to add your own rule.

-- or --

Create a module which replaces the template file - reources/views/robots.txt.phtml
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 3 weeks ago #8

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
Thank you so far.
I have a question with respect to the default rules in webtrees. Many of my pages - even the home page! - are marked "<meta name="robots" content="noindex">". The respective page will therefore not be considered even by serious bots like Google or Bing, in my opinion this is not an appropriate solution.
What is the ratio behind this?

Please Log in or Create an account to join the conversation.

Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Robots.txt 3 months 3 weeks ago #9

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
I forgot to mention: Goggle Search Console says: 687 indexed, 7023 not indexed, incl. the home page.

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 3 weeks ago #10

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
After dpuble-checking I saw, that this could be a problem of the Justlight theme.

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 3 weeks ago #11

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
For the time being, I adjusted the meta.phtml file in the Justlight theme, it works.

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 3 weeks ago #12

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 14511
> Many of my pages - even the home page! - are marked "<meta name="robots" content="noindex">"

Not true?

Visit dev.webtrees.net/demo-dev/tree/demo
View the source code...
<meta name="robots" content="index,follow">
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 3 weeks ago #13

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
It is a problem in the Justlight theme.

Please Log in or Create an account to join the conversation.

Robots.txt 3 months 1 hour ago #14

  • jheiler
  • jheiler's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 92
Hi,
I have another question.
My web pages are located at www.heiler-ahnen.de/webtrees, the automatically created robots.txt file can also be found there. But Google obviously searches for this file under www.heiler-ahnen.de and does not find it. Do I have to put a second robots.txt file there?
Kind regards

Please Log in or Create an account to join the conversation.

Last edit: by jheiler.

Robots.txt 2 months 4 weeks ago #15

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 14511
robots.txt files *only* work in the root folder.

You must take the robots.txt file created by webtrees
www.heiler-ahnen.de/webtrees/robots.txt
and then copy it to www.heiler-ahnen.de/robots.txt

(Tip - the robots.txt generated by webtrees contains these instructions!)

If you have other applications, then you must merge the robots.txt
with those for the rest of your site.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Robots.txt 3 weeks 4 days ago #16

  • Supermarkert
  • Supermarkert's Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 36
I've finally enabled Pretty Urls for the first time, and the robots.txt file is not being generated at www.themarkerts.net (running 2.0.15). I just get a 404 page.

Any ideas?
Steve
I know just enough to be dangerous.

Please Log in or Create an account to join the conversation.

Last edit: by Supermarkert.

Robots.txt 3 weeks 4 days ago #17

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 14511
> I just get a 404 page.

Works for me...

www.themarkerts.net/robots.txt
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Robots.txt 3 weeks 4 days ago #18

  • Supermarkert
  • Supermarkert's Avatar
  • Offline
  • New Member
  • New Member
  • Posts: 36
Well, that's infuriating. I tried for an hour and couldn't get it to load. Now it's working fine.
Thanks for the response. I guess I'm good now.
Steve
I know just enough to be dangerous.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
Powered by Kunena Forum