Selenium Web Scrapping with Beautiful Soup on Dynamic Content and Hidden Data

You should target the element after has loaded and take arguments[0] and not the entire page via document

html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')

This has 2 practical cases:


the element is not yet loaded in the DOM and you need to wait for the element:

sleep(experimental) # usually get will finish only after the page is loaded but sometimes there is some JS woo running after on load time try: element= WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'your_id_of_interest'))) print "element is ready do the thing!" html_of_interest=driver.execute_script('return arguments[0].innerHTML',element) sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
except TimeoutException: print "Somethings wrong!" 


the element is in a shadow root and you need to expand first the shadow root, probably not your situation but I will mention it here since it is relevant for future reference. ex:

import selenium
from selenium import webdriver
driver = webdriver.Chrome()
from bs4 import BeautifulSoup def expand_shadow_element(element): shadow_root = driver.execute_script('return arguments[0].shadowRoot', element) return shadow_root driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui') html_of_interest=driver.execute_script('return arguments[0].innerHTML',root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
sel_soup# empty root not expande shadow_root1 = expand_shadow_element(root1) html_of_interest=driver.execute_script('return arguments[0].innerHTML',shadow_root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')

enter image description here

Related posts

Leave a Reply

Be the First to Comment!

Notify of