First, import two libraries, BeautifulSoup and urllib3. Then load the web page using the PoolManager
class in urllib3. It then uses BeautifulSoup
to parse the HTML code and extract the information it needs.
import bs4
import urllib3
http = urllib3. PoolManager()
# Load a web page
response = http.request('GET', 'http://www.example.com/ai-news')
# Parsing HTML code
soup = bs4. BeautifulSoup(response.data, 'html.parser')
# Extract AI News Titles
titles = soup.find_all('h2')
for title in titles:
print(title.text)
# Extract AI News article text
articles = soup.find_all('p')
for article in articles:
print(article.text)
The above code uses BeautifulSoup to extract an element enclosed in h2
tags in HTML and display its text. It also extracts the element enclosed in the p
tag as well and displays its text. In this way, you can scrape the AI news title and article body from the web page.