mardi 2 février 2021

How to extract a p value from a web page using Jsoup

I'm trying to extract a text in hebrew from a web page

https://www.sefaria.org/Berakhot.2a.2?lang=he&with=all&lang2=he

But the result is just the first P, so I tried with this option too:

Elements elements = document.select(".segmentNumber sans .content-section p");

But nothing happen.

Can you tell me what's wrong with the code, and how can I get all the P's elements from the web page?

Thanks.

this is the code


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;

public class Scraping {

    public static void main(String[] args) throws IOException {
        try{
            Document document = Jsoup.connect("https://www.sefaria.org/Berakhot.2a?lang=he").get();
    
            System.out.println( document.text() );
            System.out.println("Selecting HTML tag name having specified class name");
            
            Elements elements = document.select("p.he");
            if(elements.size() > 0)
               System.out.println(elements.get(0));
            
           }catch(IOException ioe){
            System.out.println("Unable to connect to the URL");
                                  }
    }
}

Aucun commentaire:

Enregistrer un commentaire