swiftswiftsoup

How can I get list elements with web scraping?


I need to get list elements with web scraping. I can't reach elements one by one, I can get all elements in one string. How can I get list elements using SwiftSoup or any other option?

Here my function:

 self.webView.evaluateJavaScript("document.getElementsByTagName('html')[0].innerHTML") { (value, error) in
            if error != nil {
                print("Err: \(error)")
            }else{
                
                //print(value!)
                
                self.innerDetail = value as! String
                
                do {
                    let html = self.innerDetail
                    let doc: Document = try SwiftSoup.parse(html)
                    
                    // BURADA IMAGE URL LERINI ALIRIZ DETAY SAYFALARI ICIN...
                    let imageLink = try doc.getElementsByClass("img-container")
                    let src: Elements = try imageLink.select("img[src]")
                    let imageUrlStringArray: [String?] = src.array().map { try? $0.attr("src").description }
                    
                    print(imageUrlStringArray)  // BUNDA BUTUN DETAY IMAGE URL LERI SAKLANIR..
                    
                    
                    // BURADA ARABANIN MARKASI MODELI YILI KM VE YAKIT OLARAK CEKILMESI GEREKMEKTEDIR..
                    // ONCE FIYATI TABIKI..
                    
                    let priceMainClass = try doc.getElementsByClass("price")
                    print(try priceMainClass.text())  // BU FIYATTIR..
                    
                    
                    // BURDA COK FAZLA DATA GELIYOR VE LISTE SEKLINDELER..
                    let detailClass = try doc.getElementsByClass("classified-info-list").first()
                    
                    print(try detailClass?.html())
                    
                    print(try detailClass?.text())
                    
                    
                    
                    let detailFeatures = try detailClass?.text()
                    //print(detailFeatures)
                    //self.detailFeaturesArr = detailFeatures?.components(separatedBy: " ") as! [String]
                    
                    
                    
                } catch {
                    print("err")}
                
                
            }

In detailClass?.text() I can get data but it is one string. In detailClass?.html() have list which I want to get data from there.

Here list data detailClass?.html():

Optional("<li> <strong>Fiyat</strong> <span class=\"price\"> 77.500 TL<input id=\"priceHistoryFlag\" type=\"hidden\" value=\"\" autocomplete=\"off\"> \n  <!-- ngIf: hasPriceHistory --> \n  <!-- ngIf: hasPriceHistory --> </span> </li> \n<li> <strong> İlan Tarihi</strong>&nbsp; <span> 01 Ekim 2020</span> </li> \n<li> <strong>İlan No</strong>&nbsp; <span class=\"classifiedId\" id=\"classifiedId\">865620915</span> </li> \n<li> <strong>Marka</strong>&nbsp; <span>Volvo&nbsp;</span> </li> \n<li> <strong>Seri</strong>&nbsp; <span>S40&nbsp;</span> </li> \n<li> <strong>Model</strong>&nbsp; <span>2.0 T&nbsp;</span> </li> \n<li> <strong>Yıl</strong>&nbsp; <span class=\"\"> 1999</span> </li> \n<li> <strong>Yakıt</strong>&nbsp; <span class=\"\"> Benzin &amp; LPG</span> </li> \n<li> <strong>Vites</strong>&nbsp; <span class=\"\"> Otomatik</span> </li> \n<li> <strong>KM</strong>&nbsp; <span class=\"\"> 178.000</span> </li> \n<li> <strong>Kasa Tipi</strong>&nbsp; <span class=\"\"> Sedan</span> </li> \n<li> <strong>Motor Gücü</strong>&nbsp; <span class=\"\"> 160 hp</span> </li> \n<li> <strong>Motor Hacmi</strong>&nbsp; <span class=\"\"> 1948 cc</span> </li> \n<li> <strong>Çekiş</strong>&nbsp; <span class=\"\"> Önden Çekiş</span> </li> \n<li> <strong>Renk</strong>&nbsp; <span class=\"\"> Gümüş Gri</span> </li> \n<li> <strong>Garanti</strong>&nbsp; <span class=\"\"> Hayır</span> </li> \n<li> <strong>Plaka / Uyruk</strong>&nbsp; <span class=\"\"> Türkiye (TR) Plakalı</span> </li> \n<li> <strong>Kimden</strong>&nbsp; <span class=\"fromOwner\"> Sahibinden</span> </li> \n<li> <strong>Görüntülü Arama İle Görülebilir</strong>&nbsp; <span class=\"\"> Evet</span> </li> \n<li> <strong>Takas</strong>&nbsp; <span> Hayır&nbsp; </span> </li> \n<li> <strong>Durumu</strong>&nbsp; <span> İkinci El&nbsp; </span> </li> \n<li class=\"hiddenAttributes\"> <input type=\"hidden\" autocomplete=\"off\" class=\"classifiedAttr\" id=\"attrClassifiedId\" value=\"865620915\"> <input type=\"hidden\" autocomplete=\"off\" class=\"classifiedAttr\" id=\"attrIsShipping\" value=\"false\"> </li>")

Sorry about my english. I hope it will be understandable.


Solution

  • I solve problem added code below. I found answer at python question here: How to get a list of the <li> elements in an <ul> with Selenium using Python?

    Here my code:

                        // BURDA COK FAZLA DATA GELIYOR VE LISTE SEKLINDELER..
                        let detailClass = try doc.getElementsByClass("classified-info-list").first()
                        
                        
                        let listItems = try detailClass?.getElementsByTag("li")
                        for j in try listItems!{
                            let text = try j.text()
                            print(text)
                        }