xxxx18一60岁hd中国/日韩女同互慰一区二区/西西人体扒开双腿无遮挡/日韩欧美黄色一级片 - 色护士精品影院www

  • 大小: 2KB
    文件類型: .rar
    金幣: 2
    下載: 1 次
    發布日期: 2021-07-01
  • 語言: Python
  • 標簽: 批量爬取??

資源簡介

可以批量爬取百度搜索結果的真實URL 支持谷歌HACK搜索 用之前最好看下ReadMe.txt 編寫的語言為Python

資源截圖

代碼片段和文件信息

#-*-coding:utf-8-*-

import?requestsre
from?bs4?import?BeautifulSoup
from?multiprocessing?import?Pool

headers?=?{??
‘Accept‘:?‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9*/*;q=0.8‘??
‘Accept-Encoding‘:?‘gzip?deflate?compress‘??
‘Accept-Language‘:?‘en-us;q=0.5en;q=0.3‘??
‘Cache-Control‘:?‘max-age=0‘??
‘Connection‘:?‘keep-alive‘??
‘User-Agent‘:?‘Mozilla/5.0?(X11;?Ubuntu;?Linux?x86_64;?rv:22.0)?Gecko/20100101?Firefox/22.0‘??
}

def?gain_url(url):
f=open(“url.txt““a+“)
try:
msg=requests.get(urlheaders=headers)
a=msg.text.encode(“gbk“?“ignore“).decode(‘gbk‘)
#p=r“(?=URL=\‘).+?(?=\‘\“>ript>)“
#result?=?re.compile(p)
#url_list=result.findall(a)
f.write(msg.url+‘\n‘)
f.close()
print(msg.url)
except:
print(“頁面無法訪問“)

def?gain_result(qpage):
url=“https://www.baidu.com/s?wd=“+str(q)+“&pn=“+str(page)+“0&oq=inurl%3Aphp%3Fid%3D&ie=utf-8&rsv_idx=1&rsv_pq=9b97f395000066c1&rsv_t=d7cawxJPu0sWL9GdZrVerLepGX%2Bm%2B8Gz%2BP%2BCna7MCI7Ji%2FJwpzkV0uwY7D4“

msg=requests.get(urlheaders=headers)
#a=msg.text.encode(“gbk“?“ignore“).decode(‘gbk‘)
soup=BeautifulSoup(msg.content“html.parser“)
url_list=soup.find_all(class_=“c-tools“)
for?url?in?url_list:
url=str(url)
start=url.find(“url\“:“)
end=url.find(“}“)
end_url=url[start+6:end-1]
gain_url(end_url)

def?chongfu():
url_list=[]
f=open(“url.txt““r“)
for?url?in?f.readlines():
url_list.append(url)

url_list=list(set(url_list))
f=open(“new_urls.txt““a+“)
for?url?in?url_list:
f.write(url)
f.close()

if?__name__==“__main__“:
q=input(“請輸入你要百度的內容:?“)
m=input(“請輸入要爬取的總頁數:?“)
pnums=input(“請輸入線程池的個數:?“)
p=Pool(int(pnums))
for?i?in?range(0int(m)):
p.apply(func=gain_resultargs=((qi))) #i為百度搜索結果的頁數

p.close()
p.join()
chongfu()

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件????????727??2018-04-01?18:24??ReadMe.txt

?????文件???????1929??2018-04-01?18:21??baidu_search.py

-----------?---------??----------?-----??----

?????????????????2656????????????????????2


評論

共有 條評論

相關資源