文章内容
2020/4/19 10:07:15,作 者: 黄兵
Python urllib user_agent
如果默认不更改Python的urllib user_agent,则user_agent的字符串是:
[19/Apr/2020:10:02:44 +0800] "GET / HTTP/1.1" 200 47657 "-" "Python-urllib/3.7"
这里摘录的是Nginx的日志文件,可以看到最后一行是默认的Python的urllib user_agent。
如果需要自定义urllib的头(header),具体代码如下:
# !/usr/bin/env python3 # -*- coding: utf-8 -*- from bs4 import BeautifulSoup from urllib.request import Request, urlopen from urllib.error import HTTPError, URLError try: req = Request('https://www.materialtools.com/') req.add_header('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36') except (HTTPError, URLError) as e: self._logger_write_file.error('执行 get_page_data 函数时出现错误,具体错误内容: {error_message}'.format(error_message=e)) return False html = urlopen(req) bsObj = BeautifulSoup(html.read()) print(bsObj)
或者在urllib2里面的写法如下:
try: from urllib.request import Request, urlopen # Python 3 except ImportError: from urllib2 import Request, urlopen # Python 2 req = Request('http://api.company.com/items/details?country=US&language=en') req.add_header('apikey', 'xxx') content = urlopen(req).read() print(content)
之后看看Nginx日志文件内容:
[19/Apr/2020:10:34:25 +0800] "GET / HTTP/1.1" 200 47658 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
参考资料:
1、How do I set headers using python's urllib?
黄兵个人博客原创。
评论列表