This Help forum is for issues relates to the latest release (1.1.x). For issues related to beta or svn version please use their own Help forum.
When requesting help please provide as much information as possible. Explain what version of webtrees, PHP and MYSQL you are using. If possible provide a URL to your site so we can see the problem first-hand.
Tip: Think about putting these details in your signature, so it appears in the footer of ALL your messages
  • Page:
  • 1

TOPIC: Site Index woes

Site Index woes 8 years 11 months ago #1

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
Having generates the site index files and submitted them to google, I get this message:

1099
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.

The top level site index file looks ok but the three sub index files are the ones facing issues I think. Here's a typical extract:

- <url>
<loc> www.heinrich.id.au/webtrees/index.php?ct...com&ged=heinrich.GED </loc>
<changefreq>monthly</changefreq>
<priority>0.7</priority>
</url>
- <url>
<loc> www.heinrich.id.au/webtrees/individual.p...001&ged=heinrich.GED </loc>
<lastmod>2011-05-13</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>


Can anyone see anything obvious wrong? Let me know if I should post the files somewhere for further analysis?. Is there a validation tool on the internet somewhere that I can use to debug the sitemaps?

Once again, thank you for assisting.

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #2

Hi

It looks like the closing </loc> tag has been accidentally included in the URL (you will see it appears as part of the link here).

Can you raise a bug report for this please.
Nigel

www.our-families.info

Hosted at:
Follow me at:

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #3

  • ToyGuy
  • ToyGuy's Avatar
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
  • Posts: 4925
Keith
Not that it shouldn't be fixed, but may I ask your purpose for submitting the sitemap to Google? webtrees' sites are set by default to be bot/spidering friendly. JSYK, almost too friendly as the bots used much bandwidth in May and I don't submit sitemaps:
MSNBot:          2.19 GB
GoogleBot        2.05 GB
Yandex Bot   794.03 MB
Yahoo Slurp 640.30 MB
BaiDu Bot     624.25 MB

You may believe you should need to do so, but as you can see, you don't.
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #4

ToyGuy wrote: webtrees' sites are set by default to be bot/spidering friendly. JSYK, almost too friendly as the bots used much bandwidth in May and I don't submit sitemaps


Well, without a sitemap bots waste a lot of time indexing thousands of variations of indilist.php. A sitemap gives them direct links to individual.php which would be much more efficient. If a bot is only going to spend X amount of time on your site, it would be better to have them spend it on pages you actually want indexed.

kiwi wrote: It looks like the closing </loc> tag has been accidentally included in the URL (you will see it appears as part of the link here).

I don't think this is the problem, this is just the forum software getting confused because it doesn't understand XML

keith_h wrote: Can anyone see anything obvious wrong?

I don't see any problems with the snippet you provided, would you mind giving me the url to your sitemap files?
Larry
webtrees 2.0.2
Hosted by fisharebest - hosting.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #5

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
Sorry I might clarify the error is from google's webmaster tools which won't recognise the sitemaps.

www.heinrich.id.au/webtrees/SitemapIndex.xml
www.heinrich.id.au/webtrees/SM_Curran family tree.xml
www.heinrich.id.au/webtrees/SM_GALLEN 200909.xml
www.heinrich.id.au/webtrees/SM_heinrich.xml

Thanks guys.

Checking sitemaps for other sites shows the same structure: <url>
<loc> www.heinrich.id.au/wp/?p=267 </loc>
<lastmod>2011-05-22T00:11:19+00:00</lastmod>
<changefreq>monthly</changefreq>
<priority>0.6</priority>
</url>

I'll wait for some feedback on the full files before I raise a bug report.

Regards why I use them, standard practice on all my websites and sites I do for others.

Please Log in or Create an account to join the conversation.

Last edit: by keith_h.

Re: Site Index woes 8 years 11 months ago #6

Do you have the ablity to manually edit this this file?
www.heinrich.id.au/webtrees/SitemapIndex.xml

If so, try changing these references:
www.heinrich.id.au/webtrees/SM_Curran family tree.xml
www.heinrich.id.au/webtrees/SM_GALLEN 200909.xml

To this:
www.heinrich.id.au/webtrees/SM_Curran+family+tree.xml
www.heinrich.id.au/webtrees/SM_GALLEN+200909.xml

(i.e. replace the spaces with '+' )

If that solves the problem, this will be a fairly easy fix.

-larry
Larry
webtrees 2.0.2
Hosted by fisharebest - hosting.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #7

  • ToyGuy
  • ToyGuy's Avatar
  • Offline
  • Moderator
  • Moderator
  • Live like it's Christmas every day - Santa Stephen
  • Posts: 4925
As part of Larry's reply, and something we discussed offline:
Spaces are never a good idea. While webtrees can accommodate them, as do some of the other sophisticated browsers, not every platform, every browser and every program will adapt or convert them properly. This is also true of media and other file names.

Again, while webtrees will, in most cases, adapt to use of spaces, we recommend that you not use them, but rather replace them with underscores or dashes.
Santa Stephen the Fabled Santa
Latest webtrees at MyArnolds.com
Hosted by webtreesonline.com , a division of GeneHosts LLC
MacOS 10.6.8, Apache 2.2+, PHP 5.4.16, MySQL 5.5.28

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #8

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
Larry

made the change will advise if it works out. I am confident this may be the problem.

Regards spaces, yep, agree. Legacy situation. Something I should have fixed before the migration.

Regards

Keith

Please Log in or Create an account to join the conversation.

Do you need a web hosting solution for your webtrees site?
If you prefer a host that specialises in webtrees, the following page lists some suppliers able to provide one for you: 

Re: Site Index woes 8 years 11 months ago #9

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 12895
<<Regards why I use them, standard practice on all my websites and sites I do for others. >>

They contain the last modification timestamp. This means that Google can read the sitemap file, discover that a page has not changed since it was last fetched, and therefore not fetch the page again.

In theory, this reduces your server load.

Also, we only allow the indilist.php, famlist.php, etc. files to be shown to search engines to allow them to discover unlinked indiivduals. These list pages was traditionally very slow (lots of privacy filtering). They are somewhat better now (since 1.1.2), but still take a lot of server resources. With sitemaps, you would not need to let seach engines index these pages.

It is reported (anecdotally), that google, etc., like sites with sitemaps (an indication of the site's quality), and favour them in page rankings. I have not been able to confirm/deny this, but it seems a widely-held belief.
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #10

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
All I know is that sites I do this with rank well and quickly, others not so much. Still awaiting google to update.

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #11

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 12895
FYI, I have just generated a sitemap for my own site, and it validates OK using www.validome.org/google/validate

I then tried added spaces to the URLs (e.g. &ged=my family.ged), and this also validates OK.

Does this validator give any useful information for your own sitemaps?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #13

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 12895
Well, if they look good to us, and they look good to the validator, then I don't know what to suggest...
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #14

  • fisharebest
  • fisharebest's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 12895
...what about size?

There are limits on both size and number of entries.

The sitemap module does not take account of either of these. Is your file, perhaps, too big?
Greg Roach - This email address is being protected from spambots. You need JavaScript enabled to view it. - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #15

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
I'm thinking the validator is imaging things. I've done: this renamed the files with underscores, made sure they are the same in the sitemapindex (they weren't) and resubmitted.

Turns out the errors were 404 cannot find the file errors. So this is basic stuff so far, I just wasn't paying attention.

So I have resubmitted and await again google to dl the files and validate.

I'm thinking this might work out better now. One file is 180k.

Thanks for the observations, they have all very helpful.

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #16

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
Ok, got this error now:

2689
Parsing error
We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.

For this sitemap: SM_GALLEN_200909.xml

I'll keep tinkering with it I think.

www.xml-sitemaps.com/index.php?op=valida....xml&submit=Validate

Schema validating with XSV 3.1-1 of 2007/12/11 16:20:05Target: www.heinrich.id.au/webtrees/SM_GALLEN_200909.xml
(Real name: www.heinrich.id.au/webtrees/SM_GALLEN_200909.xml
Length: 180092 bytes

Last Modified: Wed, 01 Jun 2011 12:13:27 GMT
Server: Apache)docElt: { www.sitemaps.org/schemas/sitemap/0.9}urlsetValidation was strict, starting with type [Anonymous]schemaLocs: www.sitemaps.org/schemas/sitemap/0.9 -> www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsdThe schema(s) used for schema-validation had
no errorsNo schema-validity problems were found in the target

Please Log in or Create an account to join the conversation.

Re: Site Index woes 8 years 11 months ago #17

  • keith_h
  • keith_h's Avatar Topic Author
  • Visitor
  • Visitor
Success - no idea why.any thanks for all your help.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
Powered by Kunena Forum