How can I fetch these records from Wikipedia as easily as possible? I need in a JSON file for each of these areas the displayed names: https://en.wikipedia.org/wiki/Category:Surnames_by_language
Example
[
{
name: "Agalliu",
language: "Albanian"
},
{
name: "Agolli",
language: "Albanian"
}
...
]
I´m working with Angular5.
Also: Is it legal for me to create a database with the information that the data is from Wikipedia?
I don't work with Angular 5 nor typescript, so I don't know at a technical level how to develop the specific code you need, but I think what you need is to have a look to the HttpClient documentation. This search in GitHub might help you to find some module already developed. Angular seems very well documented, that's very nice. So my answer is more theoretical than technical.
About the data you want to get in the JSON file, surname and the language of this surname, if you only want to work with the pages in the category I think the best way might me to extract the title of the page of each page and the language from the title of the subcategory analyzed. If you want to do it:
Irish-language feminine surnames
and Irish-language masculine surnames
should be cleaned as Irish. It would be nice if you will have another JSON value to keep the title of the category, because it would help you to recover the URL in the futureHoti (surname)
. Of course, as in the last point about the category title, I recommend you to create another JSON value to keep the title of the page and keep it due the possible case in with you would need it.I think another good way to do it is querying to Wikidata, because there are many pages with structures very different and there isn't an infobox generalized in all of them, what it would make easier to get the data because you would be able to scrape an specific field (language or whatever it may be). However, extract it from Wikidata and no from the category has disadvantages too:
surname (multiple languages)
. Take a look at MediaWiki API and Wikidata:Data Access.
Yes, it is perfectly legal. What you have to do is to respect the license. In the case of the English Wikipedia, it is licensing under Creative Commons Attribution-ShareAlike 3.0 Unported. This license allows you to reuse and change the content in a commercial and non-commercial way, but you must attribute the authorship and to share the derivatives with the same license.
In the case of Wikidata, all in the namespaces of items and properties (Q:* and P:*) are in public domain and marked as CC0, a Creative Commons tool to show that a work is in the public domain. What can you do with the data? Whatever you want.
I recommend you to read the Creative Commons' FAQ about the CC0 and the legal code of the Creative Commons Attribution-ShareAlike 3.0 Unported.