文章内容

2021/4/9 17:23:10,作 者: 黄兵

BeautifulSoup 解析表格示例

最近需要使用BeautifulSoup解析表格,下面是表格的一个代码:

<div class="rtfragc" id="rtfragc_asnr" data-hasmore="0" data-finished="1" data-clen="1469">
<table class="whois">
<tbody>
<tr>
<td>route</td>
<td><a href="/cidr/103.165.48.0-23">103.165.48.0/23</a></td>
</tr>
<tr>
<td>descr</td>
<td>National Board of Revenue (NBR), Bangladesh</td>
</tr>
<tr>
<td>origin</td>
<td><a href="/as/AS142021.html">AS142021</a></td>
</tr>
<tr>
<td>mnt-by</td>
<td>MAINT-NBORB-BD</td>
</tr>
<tr>
<td>last-modified</td>
<td>2021-03-29T08:39:50Z</td>
</tr>
<tr>
<td>source</td>
<td>APNIC</td>
</tr>
</tbody>
</table>
<h3>103.165.48.0/24 National Board of Revenue (NBR), Bangladesh source:APNIC</h3>
<table class="whois">
<tbody>
<tr>
<td>route</td>
<td><a href="/cidr/103.165.48.0-24">103.165.48.0/24</a></td>
</tr>
<tr>
<td>descr</td>
<td>National Board of Revenue (NBR), Bangladesh</td>
</tr>
<tr>
<td>origin</td>
<td><a href="/as/AS142021.html">AS142021</a></td>
</tr>
<tr>
<td>mnt-by</td>
<td>MAINT-NBORB-BD</td>
</tr>
<tr>
<td>last-modified</td>
<td>2021-03-29T08:41:36Z</td>
</tr>
<tr>
<td>source</td>
<td>APNIC</td>
</tr>
</tbody>
</table>
<table class="whois">
<tbody>
<tr>
<td>route</td>
<td><a href="/cidr/103.165.49.0-24">103.165.49.0/24</a></td>
</tr>
<tr>
<td>descr</td>
<td>National Board of Revenue (NBR), Bangladesh</td>
</tr>
<tr>
<td>origin</td>
<td><a href="/as/AS142021.html">AS142021</a></td>
</tr>
<tr>
<td>mnt-by</td>
<td>MAINT-NBORB-BD</td>
</tr>
<tr>
<td>last-modified</td>
<td>2021-03-29T08:42:50Z</td>
</tr>
<tr>
<td>source</td>
<td>APNIC</td>
</tr>
</tbody>
</table>
<div></div>
</div>

具体解析代码:

data = []
if get_cidr_report:
for child in get_cidr_report.find("div", {"id": {"rtfragc_asnr"}}).findAll("table", {"class": {"whois"}}):
rows = child.findAll('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
print(data)

这里首先搜寻class="whois"的表格,获取行tr数据,之后获取td数据,将解析的数据加入Python List。


参考资料:

1、python BeautifulSoup parsing table

分享到:

发表评论

评论列表