文章内容

2020/9/29 15:26:30,作 者: 黄兵

在Python中使用Selenium配合Chrome做爬虫的总结

最近在研究Python爬虫,正好有一个网站使用Javascript生成的,需要Selenium配合Chrome做爬虫,抓取网站内容。

1、首先是下载Chrome,在此处:ChromeDriver - WebDriver for Chrome

2、下载了之后解压出来

3、安装Selenium包,我是通过PyCharm直接安装的,同时也可以到这里下载:https://pypi.org/simple/selenium/

4、之后开始编码,完整代码如下:

# !/usr/bin/env python3
# -*- coding: utf-8 -*-

from apscheduler.schedulers.blocking import BlockingScheduler
from bs4 import BeautifulSoup
from selenium import webdriver
from urllib.error import HTTPError, URLError
import requests
from datetime import datetime
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from config import logger_config, REQUEST_URI, RECEIVE_SMS_URI, PING_ME_URI


class WebScrapingMain:
def __init__(self):
logger_name = 'Web Scraping to america.materialtools.com'
self._logger_write_file = logger_config.LoggingConfig().init_logging(logger_name)

@staticmethod
def get_ping_me_svg():
driver = webdriver.Chrome(executable_path=r'E:/Tools/chromedriver_win32/chromedriver.exe')
uri = f'receive-sms-online-usa-12202007090'
driver.get(PING_ME_URI + uri)
try:
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.TAG_NAME, "svg")))
finally:
get_svg = driver.find_element_by_tag_name("svg")
captcha_png_path = os.path.join(os.getcwd(), f'captcha_img\\{datetime.timestamp(datetime.utcnow())}.png')
get_svg.screenshot(captcha_png_path)
driver.quit()


if __name__ == '__main__':
# 实例化一个调度器
scheduler = BlockingScheduler()
web_scraping = WebScrapingMain()
# 添加任务并设置触发方式:每分钟执行一次
scheduler.add_job(web_scraping.get_ping_me_svg, 'interval', seconds=60)
# 开始运行调度器
scheduler.start()

关于使用Chrome遇到的一些问题,在这篇文章有介绍,地址:Message: 'chromedriver_win32' executable may have wrong permissions.


参考资料:

1、ChromeDriver - WebDriver for Chrome

2、Python, PhantomJS says I am not using headless?

3、screenshot() element method – Selenium Python


黄兵个人博客原创。

转载请注明出处:黄兵个人博客 - 在Python中使用Selenium配合Chrome做爬虫的总结

分享到:

发表评论

评论列表