文章内容

2020/4/19 10:07:15,作 者: 黄兵

Python urllib user_agent

如果默认不更改Python的urllib user_agent,则user_agent的字符串是:

[19/Apr/2020:10:02:44 +0800] "GET / HTTP/1.1" 200 47657 "-" "Python-urllib/3.7"

这里摘录的是Nginx的日志文件,可以看到最后一行是默认的Python的urllib user_agent。

如果需要自定义urllib的头(header),具体代码如下:

# !/usr/bin/env python3
# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError

try:
        req = Request('https://www.materialtools.com/')
        req.add_header('User-agent',
                       'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36')
    except (HTTPError, URLError) as e:
        self._logger_write_file.error('执行 get_page_data 函数时出现错误,具体错误内容: {error_message}'.format(error_message=e))
        return False
    html = urlopen(req)
    bsObj = BeautifulSoup(html.read())
    print(bsObj)

或者在urllib2里面的写法如下:

try:
    from urllib.request import Request, urlopen  # Python 3
except ImportError:
    from urllib2 import Request, urlopen  # Python 2

req = Request('http://api.company.com/items/details?country=US&language=en')
req.add_header('apikey', 'xxx')
content = urlopen(req).read()

print(content)

之后看看Nginx日志文件内容:

[19/Apr/2020:10:34:25 +0800] "GET / HTTP/1.1" 200 47658 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"


参考资料:

1、How do I set headers using python's urllib?


黄兵个人博客原创。

转载请注明出处:黄兵个人博客 - Python urllib user_agent

分享到:

发表评论

评论列表