So I am connecting to the https://genderize.io/ API. I want to scrape from this API as fast as possible because I might need to do 1,000,000 of searches at a time. Is it possible to attach 100,000 (10 names per request) different curl_init headers with different parameters and then execute them all in parallel? It seems too good to be true if i could. Also if I can't do this how else can I speed up the requests. My current code is using one instance of curl_init and changing the URL for each cycle in a for loop. Here is my current loop:
$ch3 = curl_init();
for($x = 0; $x < $loopnumber; $x = $x + 10){
$test3 = curl_setopt_array($ch3, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'https://api.genderize.io?name[0]=' . $firstnames[$x] . '&name[1]=' . $firstnames[$x+1] . '&name[2]=' . $firstnames[$x+2] . '&name[3]=' . $firstnames[$x+3] . '&name[4]=' . $firstnames[$x+4] . '&name[5]=' . $firstnames[$x+5] . '&name[6]=' . $firstnames[$x+6] . '&name[7]=' . $firstnames[$x+7] . '&name[8]=' . $firstnames[$x+8] . '&name[9]=' . $firstnames[$x+9]
));
$resp3 = curl_exec($ch3);
echo $resp3;
$genderresponse = json_decode($resp3,true);
Yes, it is possible - in theory. But no, it won't work in practice. You better stay within a few hundred parallel connections.
You will probably run out of sockets and possibly memory before you can create one million easy handles and add them to a libcurl multi handle.
If you intend to communicate with the single same remote IP and port number and you only have one local IP address, and as each connection needs its own local port number you can't do more than 64K theoretic connections in parallel. You won't even get to 64K on most default configured operating systems. (You can do more if you speak to more remote IPs or have more local IPs to bind the connections to.)
For the sake of this argument, if we assume you actually get up to 60K simultaneous connections, then you'll find out that the curl_multi_* API gets to a crawling speed with that many connections as it is select/poll based. libcurl itself has an event-based API that is the recommended one when you go beyond perhaps a few hundred parallel connections, but from within PHP you have no way to access nor use that.