swiftcurlnsurlsessionurlsession

Get webpage content with URLSession returns 403


I'm trying to get contents of a webpage via UrlSession, but the response status code it 403 and checking the body it seems it has a protection for robots.

The strange thing is when I get the page via curl it returns the page contents properly.

I Checked the curl command with --verbose option and there where 3 headers sent via the request. I added them to URLRequest and still getting 403 response. What's the difference between a curl GET request and URLSession request? How can I simulate the curl request or a browser request vie URLSession?

 var urlRequest = URLRequest(url: url)
 urlRequest.httpMethod = "GET"
 urlRequest.setValue("websiteurl.com",forHTTPHeaderField: "Host")
 urlRequest.setValue("curl/8.4.0",forHTTPHeaderField: "User-Agent")
 urlRequest.setValue("*/*",forHTTPHeaderField: "Accept")
        
 let (data, response) = try await URLSession.shared.data(for: urlRequest)
 print((response as? HTTPURLResponse).statusCode)
 // prints out 403

Solution

  • Tried this in a unit test and works fine:

    func testExample() throws {
        let exp = expectation(description: "...")
        var request = URLRequest(url: URL(string: "https://dibamovie14.top")!)
        let task = URLSession.shared.dataTaskPublisher(for: request).sink {
            print($0)
            exp.fulfill()
        } receiveValue: { data, response in
            print(response)
        }
        
        waitForExpectations(timeout: 4) { error in
            guard let error = error else { return }
            print(error.localizedDescription)
        }
    }
    

    call is using the dataTaskPublisher but you should be able to apply the changes to your async approach. User-Agent should be set by default anyway when using URLSession and "Host" is not required

    Addendum: you can set the User-Agent if you need it but "Host" should not be set:

    Host - Header Field Mandatory since HTTP/1.1.[17] If the request is generated directly in HTTP/2, it should not be used.[18]

    Async Await

    func testExampleAsync() async throws {
        var request = URLRequest(url: URL(string: "https://dibamovie14.top")!)
        let (data, response) = try await URLSession.shared.data(for: request)
        print((response as? HTTPURLResponse)?.statusCode)
        print(String(data: data, encoding: .utf8))
    }