Bienvenue, Invité
Nom d'utilisateur : Mot de passe :


For issues related to the current stable release please use it's own Help forum.
IMPORTANT:Please read this before using an git or nightly build version: wiki.webtrees.net/en/GIT

Before asking for help please read "How to request help" by clicking on that tab above here.

SUJET : Reverse Hierarchy Places Appearing in the Database

Reverse Hierarchy Places Appearing in the Database il y a 3 mois 3 semaines #1

  • dbq-andersons
  • Portrait de dbq-andersons
  • Hors Ligne
  • New
  • Messages : 63
Hi Everyone,

I recently switched my main site (genealogy.dbq-andersons.com) from webtrees 1.7.X to webtrees 2.0.0 Beta 3. All has gone very well except for some place name oddities I've been troubleshooting over the past few days. Every so often, new place names are being added to the database with the exact opposite hierarchy of existing place names. For example, Norway, Akershus, Nannestad, Kopperud is getting added to the place tables in the database as Kupperud, Nannestad, Akershus, Norway. See the examples from the placelocation table below:

+-------+--------------+----------+------------------------------------+--------------+-------------+---------+-------------------------+
| pl_id | pl_parent_id | pl_level | pl_place                           | pl_long      | pl_lati     | pl_zoom | pl_icon                 |
+-------+--------------+----------+------------------------------------+--------------+-------------+---------+-------------------------+
|   643 |           45 |        3 | Kopperud                           | E10.9708611  | N60.1942667 |      12 | NULL                    |
|    45 |           44 |        2 | Nannestad                          |              |             |       2 |                         |
|    44 |           43 |        1 | Akershus                           |              |             |       2 |                         |
|    43 |            0 |        0 | Norway                             | E10          | N62         |       4 | NULL                    |


|  1439 |            0 |        0 | Kopperud                           |              |             |       2 |                         |
|  1440 |         1439 |        1 | Nannestad                          |              |             |       2 |                         |
|  1441 |         1440 |        2 | Akershus                           |              |             |       2 |                         |
|  1442 |         1441 |        3 | Norway                             |              |             |       2 |                         |
+-------+--------------+----------+------------------------------------+--------------+-------------+---------+-------------------------+

I have found the culprit. The SemRush search bot is crawling URLs that are behaving diffrerently in WT 2.0 than in WT 1.7. Here is an example of the URL related to the place above:
http://genealogy.dbq-andersons.com/index.php?action=List&ged=scrubbed-tree&module=places_list&parent%5B0%5D=Kopperud&parent%5B1%5D=Nannestad&parent%5B2%5D=Akershus&parent%5B3%5D=Norway&route=module

In WT 1.7.x, that URL apparently silently errors out and drops me into the main tree index page. In WT 2.0, that URL takes me to a place name hierarchy page (with the hierarchy reversed from what it should be) and adds the bogus lines in the place tables as illustrated above.

Two things here: First, I'm working on blocking the SemRush bot so far without much success (robots.txt or .htaccess). Secondly, is there a way to prevent what that link is apparently doing in the database?

Any help would be appreciated.

Thanks,
Bill Anderson
Bill Anderson | Onalaska, WI | genealogy.dbq-andersons.com
Webtrees 2.0.0 beta 5 | Apache 2.4.29-1 | PHP 7.2.20-1 | MySQL 5.7.28
Ubuntu 18.04.2 LTS Running on a PC in My Basement
L'administrateur a désactivé l'accès en écriture pour le public.

Reverse Hierarchy Places Appearing in the Database il y a 3 mois 3 semaines #2

  • dbq-andersons
  • Portrait de dbq-andersons
  • Hors Ligne
  • New
  • Messages : 63
UPDATE:
It took a while, but the SemRush bot seems to be reading and obeying my robots.txt. All I get from it now is this:
46.229.168.136 - - [13/Aug/2019:09:52:36 -0500] "GET /robots.txt HTTP/1.1" 200 1157 "-" "Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)"

Still at least a little curious about how the URL it was tagging was able to inject into the database and wondering if that should be further investigated/disabled by people with more PHP knowledge than me.

Thanks,

Bill
Bill Anderson | Onalaska, WI | genealogy.dbq-andersons.com
Webtrees 2.0.0 beta 5 | Apache 2.4.29-1 | PHP 7.2.20-1 | MySQL 5.7.28
Ubuntu 18.04.2 LTS Running on a PC in My Basement
L'administrateur a désactivé l'accès en écriture pour le public.

Reverse Hierarchy Places Appearing in the Database il y a 3 mois 3 semaines #3

  • fisharebest
  • Portrait de fisharebest
  • Hors Ligne
  • Administrator
  • Messages : 11662
There's a few issues here.

Firstly, the place-hierarchy code will create the database entries if they don't exist. i.e. if you ask it to create a representation of "London, England", it will create England (if it doesn't exist), and then create London (if it doesn't exist).

(There is also lots of caching here, so that if you create a second instance of the place "London, England", webtrees won't need to go back to the database.)

For the most part, this is very efficient, and gives much better performance than the v1 code.

But, as your example shows, if the user can create a URL that references a non-existant place, then they can trigger that place to be created in the database. This non-existant place is harmless, although a malicious attacker could theoretically create a large number of random place names, and cause your database/disk to fill up, and is thus a "denial of service" attack.

Secondly, robots shouldn't be able to use the place-hierarchy pages. In webtrees v1, we detect robots using their user-agent strings and block them that way. This technique worked well when it was created (10 years ago?), but is less effective today. In webtrees v2, we load this type of page content using javascript (robots don't run javascript).

Reports, charts, etc. all load content using javascript. I guess the place-hierarchy should do the same.

Thirdly, I am wondering where the robot found this URL. webtrees uses http "meta robots" tags and "rel=nofollow" attributes to prevent robots following/indexing these links. If the robot obeys robots.txt, it should (presumably) also obey these directives. Maybe we are missing some markup somewhere.
Greg Roach - Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. - fisharebest.webtrees.net
L'administrateur a désactivé l'accès en écriture pour le public.
Propulsé par Kunena