こんにちは、ミナピピン(@python_mllover)です!
今回はBeautifulSoupで特定のHTMLタグ要素を削除・置換するメソッドについてメモしておきます。
BeautifulSoupで特定のHTMLタグ要素を削除・置換する
PythonのBeautifulSoupでは、.extract()
, .replace_with()
関数を使うことで特定のHTMLタグ要素を削除・置換が行えます。
from bs4 import BeautifulSoup
txt = """<p>I have a dog. His name is <span class="secret">Ken</span>.</p>"""
soup = BeautifulSoup(txt)
# This keeps "unwanted" information
soup.get_text()
#: u'I have a dog. His name is Ken.'
# remove an element by tag matching
soup.find("span", {"class":"secret"}).extract()
soup.get_text()
#: u'I have a dog. His name is .'
# or you can replace that with something
soup = BeautifulSoup(txt)
soup.find("span", {"class":"secret"}).replace_with("confidential")
soup.get_text()
#: u'I have a dog. His name is confidential.'
参考:https://qiita.com/kota9/items/ee921b742f65b3db50bd
コメント