python脚本之批量查询网站权重 - 成就云开发者社区

爱站批量查网站权重

相信很多人在批量刷野战的时候，会去查看网站的权重吧，然后在决定是否提交给补天还在是盒子。但是不能批量去查询，很困惑，作为我这个菜鸟也很累，一个个查询的。所以写了这个脚本。参考脚本爱站批量查询网址权重2.0版本。

演示

如果在cmd运行中得先转脚本对应的绝对路径下运行。不然会爆文件不存在的错误。如果在pycharm等集成环境中使用的话，将脚本文件作为一个项目打开。

将需要查询的网站保存在相同目录下的websites.txt文本中，一行一个网站。

注：本脚本不能保证一次完全都能查询成功，但失败的网站会保存在Query failure.csv文件中，成功的网站会保存在webweight.csv文件中。

下面是最喜欢的环节放代码

代码语言：javascript

复制

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2020/6/2 13:50
# @Author  : 王先森
# @Blog    : www.boysec.cn
# @Software: PyCharm
# @Function :  批量查网站权重
import requests

import csv

from lxml import etree

import threading

from queue import Queue

import time

from urllib.parse import urlparse
header={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 SE 2.X MetaSr 1.0'}

class WebWeight(threading.Thread):

def init(self,queue):

threading.Thread.init(self)

self.queue = queue

def run(self):

while not self.queue.empty():

# 爱站的网站url

aizhanurl = 'https://www.aizhan.com/cha/'

# 获取查询的网站的url

chaxunurl = self.queue.get()

print("[+] 正在查询："+ chaxunurl)

url = aizhanurl+chaxunurl

time.sleep(4)  # 延迟4s

re = requests.get(url, headers=header)

print("[-] 请求url："+ url)

html = re.text.encode(re.encoding).decode('utf-8')

tree = etree.HTML(html)

tags = tree.xpath('//a[@id="baidurank_br"]//@alt')
        # 写入数据
        # 如果使用python3就下面这个
        with open(&#34;webweight.csv&#34;, &#34;a+&#34;, encoding=&#39;utf-8&#39;, newline=&#39;&#39;)as file:
        # 如果使用python2就下面这个
        # with open(&#34;webweight.csv&#34;, &#34;a+&#34;)as file:
        # # 创建csv对象并传参
            csvwriter = csv.writer(file)
            for tag in tags:
                # print(tag)
                strtag = str(tag)  # 将tag属性转换为字符串
                if strtag ==&#39;n&#39; or strtag == &#39;0&#39;:
                    csvwriter.writerow([&#39;&#39; + chaxunurl, &#39;&#39; + &#39;0&#39;])
                    print(&#34;[+] 查询结果: &#34;+ chaxunurl+&#34; 权重：&#34;+ str(0))
                else:
                    csvwriter.writerow([&#39;&#39; + chaxunurl, &#39;&#39; + tag])
                    print(&#34;[+] 查询结果: &#34; + chaxunurl + &#34; 权重：&#34; + tag)

def main():

with open("webweight.csv", "a", encoding='utf-8', newline='')as file:

csvwriter = csv.writer(file)

csvwriter.writerow(['weburl','weight'])

threads = []  # 线程集

# 线程数

threads_count = 5

# 队列

# 如果是使用python3就修改下面
queue = Queue()
# queue = Queue.Queue()
queue = Queue()
with open(&#34;websites.txt&#34;, &#34;r&#34;)as file:
    file_content = file.readlines()
    for i in file_content:
        # 文件读取中字符串结尾会有\r\n
        j = i.strip(&#39;\n&#39;).strip(&#39;\r&#39;)
        url = urlparse(j)
        if url.netloc:
            queue.put(url.netloc)
        else:
            queue.put(url.path)

for i in range(threads_count):
    # 添加线程
    threads.append(WebWeight(queue))
# 线程的开始与加入
for i in threads:
    time.sleep(1)
    i.start()

for i in threads:
    time.sleep(1)
    i.join()

print(&#34;Results saved in webweight.csv&#34;)

if name == 'main':

main()