Web based family history software

lightbulb Idea get information from wikidata - need help for programming

  • hermann
  • hermann's Avatar Topic Author
  • Offline
  • Elite Member
  • Elite Member
More
2 months 2 weeks ago - 2 months 2 weeks ago #1 by hermann
I just released a new version of my custom module german-chancellors-presidents  which uses a table of Chancellors and Presidents in Germany and presents the historical data together with images in the timeline of an individual. Now I had the idea that instead of using a table filled by hand it would be better to use the available information in wikidata . A SPARQL query shows all the heads of the former state GDR as JSON object . But when I send that query in my PHP module using the following code there is no answer available (no error, but the content of $response is only "<HTML>"). What is my fault?
Code:
/** * Performs a cURL request to the Wikidata API and fetches a JSON response. * * @param string $query the SPARQL query * @return string the API response as a JSON string */ function readWikidata(string $query): string { // Encode the query for the API $url = "https://query.wikidata.org/sparql?format=json&query=" . urlencode($query); // Use cURL to make the API request $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_HTTPHEADER, ['Accept: application/json']); $response = curl_exec($ch); curl_close($ch); return $response; }

     

Hermann
Designer of the custom module "Extended Family"

webtrees 2.2.1 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu
Last edit: 2 months 2 weeks ago by hermann.

Please Log in or Create an account to join the conversation.

More
2 months 2 weeks ago - 2 months 2 weeks ago #2 by ekdahl
This is an interesting topic that I am also interested in, but for another use.
There is a WikiData project that has data about all Swedish church parishes (the lowest level in my place hierarchy), including higher level entities and coordinates. Link to project
I would like use this data to either
* (Query to) get suggestions when entering places on events (and save only the used places)
or
* Populate the place hierarchy by a WikiData query and store all places in webtrees

Since there should be lots of data useful for webtrees in WikiData, maybe some common code can be used to simplify fetching data to avoid reinventing the wheel.
Sorry for being slightly off-topic and not being able to offer help, just wanted to say that fetching data from WikiData can be useful for many things in webtrees.
Last edit: 2 months 2 weeks ago by ekdahl.

Please Log in or Create an account to join the conversation.

  • Franz Frese
  • Franz Frese's Avatar
  • Away
  • Platinum Member
  • Platinum Member
More
2 months 2 weeks ago - 2 months 2 weeks ago #3 by Franz Frese
what is the content of "$response is only "<HTML>")".
or do you mean the response is: <html> ?

If there is no json, your question is not understood (or may be not available)?!
Last edit: 2 months 2 weeks ago by Franz Frese.

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Away
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
2 months 2 weeks ago #4 by bertkoor
Html instead of json response could be due to many things. I'm not on a PC, maybe tomorrow I can assist in finding the fault. I'd try to do the call with curl from cli (command prompt, terminal) look closely at what your output is, check & double-check the code.

The German chancellors & presidents... 22 lines is not much data. Since 1949, so for my site it does nothing to my public data.

Is it really better to fetch it from WikiData than having the data in csv yourself? Are you fetching it for each page render, or caching?

In programming there are only two hard problems:
1. Cache invalidation
2. Naming things
3. Off-by-one errors

The Swedish churches... There already is a method to import geographical data in CSV format. If you use that, then nothing else needs to be done. Doing the calls will be far more complicated and has a bigger impact on webtrees code.

stamboom.BertKoor.nl runs on webtrees v2.2.1

Please Log in or Create an account to join the conversation.

More
2 months 2 weeks ago #5 by ekdahl

The Swedish churches... There already is a method to import geographical data in CSV format. If you use that, then nothing else needs to be done. Doing the calls will be far more complicated and has a bigger impact on webtrees code.
Good point. Then I only need to produce a query that outputs the results as CSV in the format webtrees expects.

Please Log in or Create an account to join the conversation.

  • hermann
  • hermann's Avatar Topic Author
  • Offline
  • Elite Member
  • Elite Member
More
2 months 2 weeks ago #6 by hermann
Caching is a good idea in my case, because the data volume is low and changes one time in about 4 years for the Chancellors or Presidents. I can use the webtrees database as cache or do you have better ideas?

I got the tag <html> instead of the expected JSON object as an answer. The SPARQL query is ok.

I too have more ideas in how to use wikidata for webtrees. Sometimes it would be better to use the information in wikidata and build up a cache available in webtrees (for example export, convert to csv, and use it as location data in webtrees). Sometimes it is better to query wikidata like it is done in a similar case in the GOV Vesta-module using the GOV web service, because the volume of GOV or wikidata is too large and it makes no sense to copy all that information to your webtrees server. 

Hermann
Designer of the custom module "Extended Family"

webtrees 2.2.1 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Away
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
2 months 2 weeks ago - 2 months 2 weeks ago #7 by bertkoor
Now at my PC, I'm looking at your code and just notice the actual SPARQL query to perform is not given, but is the argument to pass to your function.

When I click on the link you gave of "all the heads of the former state GDR" etc, I am getting this result:

Code:
SPARQL-QUERY: queryStr=SELECT%20?officeHolderLabel%20WHERE%20{%20wd:Q16957%20p:P35%20?statement.%20?statement%20ps:P35%20?officeHolder.%20SERVICE%20wikibase:label%20{%20bd:serviceParam%20wikibase:language%20%27en%27.%20}} java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: Encountered " <VAR3> "%20 "" at line 1, column 7. Was expecting one of: "(" ... "*" ... "distinct" ... "reduced" ... <VAR1> ... <VAR2> ... at java.util.concurrent.FutureTask.report(FutureTask.java:122)

I do know a thing or two about doing http calls with curl, but I'm clueless about WikiData and SPARQL.

stamboom.BertKoor.nl runs on webtrees v2.2.1
Last edit: 2 months 2 weeks ago by bertkoor.

Please Log in or Create an account to join the conversation.

  • hermann
  • hermann's Avatar Topic Author
  • Offline
  • Elite Member
  • Elite Member
More
2 months 2 weeks ago #8 by hermann
I do not think that the problem is in the SPARQL query, like
Code:
SELECT ?officeHolderLabel ?birthDate ?deathDate WHERE {     wd:Q4970706 p:P1308 ?statement.     ?statement ps:P1308 ?officeHolder.     OPTIONAL { ?officeHolder wdt:P569 ?birthDate. } # Geburtsdatum     OPTIONAL { ?officeHolder wdt:P570 ?deathDate. } # Todesdatum     SERVICE wikibase:label { bd:serviceParam wikibase:language 'de,en'. } }
You can use  query.wikidata.org/ to test such queries.

When I click in my browser on the similar link  heads of the former state GDR as JSON object  with the URL-encoded query I get a response like 
Code:
{     "head": {         "vars": [             "officeHolderLabel"         ]     },     "results": {         "bindings": [             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Walter Ulbricht"                 }             },             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Erich Honecker"                 }             },             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Willi Stoph"                 }             },             {                 "officeHolderLabel": {                     "type": "literal",                     "value": "Q57232"                 }             },             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Wilhelm Pieck"                 }             },             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Egon Krenz"                 }             },             {                 "officeHolderLabel": {                     "xml:lang": "en",                     "type": "literal",                     "value": "Manfred Gerlach"                 }             }         ]     } }

I do not understand why you, Bert, got a java error when clicking on the same link.

And I would expect to receive such a JSON object using the code in my first post. But something is wrong with my code. Maybe, 
Code:
curl_exec($ch);
does not work inside webtrees.

Hermann
Designer of the custom module "Extended Family"

webtrees 2.2.1 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Away
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
2 months 2 weeks ago - 2 months 2 weeks ago #9 by bertkoor

A SPARQL query shows all the heads of the former state GDR as JSON object . But ...
For me simply clicking this link does not work. With the Safari browser at least.

Now I have the full url in this window, pasting that in a browser does give results. Likely something with encoding accolades or whatever. The response looks like json in my browser, but the underlying document seems html.

When I copy-paste the url to use with the cli-version of CURL, it says "nested brace in URL position 195".
That I could solve with replacing { by %7B and } by %7D.
Code:
$ curl -v "https://query.wikidata.org/sparql?format=json&query=SELECT%20?officeHolderLabel%20WHERE%20%7B%20wd:Q16957%20p:P35%20?statement.%20?statement%20ps:P35%20?officeHolder.%20SERVICE%20wikibase:label%20%7B%20bd:serviceParam%20wikibase:language%20%27en%27.%20%7D%7D"

I'm getting a header "content-type: application/sparql-results+json;charset=utf-8" so the response is definitely json.

stamboom.BertKoor.nl runs on webtrees v2.2.1
Last edit: 2 months 2 weeks ago by bertkoor.

Please Log in or Create an account to join the conversation.

  • bertkoor
  • bertkoor's Avatar
  • Away
  • Platinum Member
  • Platinum Member
  • Greetings from Utrecht, Holland
More
2 months 2 weeks ago - 2 months 2 weeks ago #10 by bertkoor
Progress! You don't need curl. I'm not seeing it actually being used in webtrees code. So I asked google how to perform a simple http call, and it came with a StackOverflow reply :
_

How to send a GET request from PHP?

Unless you need more than just the contents of the file, you could use file_get_contents.

Code:
$xml = file_get_contents("http://www.example.com/file.xml");

 


And indeed, `file_get_contents` is being used at multiple places in webtrees. It seems wikidata does not really care about the "Accept" header, you have a query parameter "format=json" for that.

Putting it all together:
Code:
$query = "SELECT ?officeHolderLabel WHERE { wd:Q16957 p:P35 ?statement. ?statement ps:P35 ?officeHolder. SERVICE wikibase:label { bd:serviceParam wikibase:language 'en'. }}"; $url = "https://query.wikidata.org/sparql?format=json&query=" . urlencode($query); $result = file_get_contents($url);


But there might be a reason why in webtrees code Greg chose to use Guzzle .
Example here

stamboom.BertKoor.nl runs on webtrees v2.2.1
Last edit: 2 months 2 weeks ago by bertkoor.

Please Log in or Create an account to join the conversation.

  • hermann
  • hermann's Avatar Topic Author
  • Offline
  • Elite Member
  • Elite Member
More
2 months 2 weeks ago #11 by hermann
Thank you, Bert! This is very helpful.

I first checked that urlencode() replaces the opening and closing braces. That is ok.

Then I tried to use the function file_get_contents(), but that resulted in an error: 
Failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden

Then I used  the following code
Code:
$client = new Client(['timeout' => 3,]); $response = $client->get($url); return $response->getBody()->getContents();
Heureka!
It is working!!! Thanks ou so much! You made my last day that year!
best regards Hermann

Hermann
Designer of the custom module "Extended Family"

webtrees 2.2.1 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

More
2 months 2 weeks ago #12 by fisharebest
> But there might be a reason why in webtrees code Greg chose to use Guzzle .

There is a PHP.INI setting which prevents/allows file_get_contents() to work with URLs. It was frequently blocked by web-hosts for "security reasons". I'm not sure if it is widely blocked today. Probably not.

The guzzle library makes the everything a lot easier. For example, it is possible to set timeouts for file_get_contents() but it is not well documented.

But we only use a tiny part of this library, so it might be worth replacing it with our own code. I'll investigate...

Greg Roach - greg@subaqua.co.uk - @fisharebest@phpc.social - fisharebest.webtrees.net

Please Log in or Create an account to join the conversation.

  • hermann
  • hermann's Avatar Topic Author
  • Offline
  • Elite Member
  • Elite Member
More
2 months 2 weeks ago #13 by hermann
Thanks your help there is now an updated module that is able to read from wikidata: github.com/hartenthaler/german-chancellors-presidents  .

Try it, please. And give me feedback. And send me more ideas what can be done using wikidata. There are several small and a few large genealogical trees in wikidata. How could they be presented or imported to webtrees?

Hermann
Designer of the custom module "Extended Family"

webtrees 2.2.1 (all custom modules installed, PHP 8.3.12, MariaDB 10.6) @ ahnen.hartenthaler.eu

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum