欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 文旅 > 手游 > python爬虫题目

python爬虫题目

2024/10/25 0:32:34 来源:https://blog.csdn.net/qq_42307546/article/details/142812071  浏览:    关键词:python爬虫题目

网站
https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/

第一道题爬取api并且保存

import requests,re
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}res = requests.get(url,headers=headers).json()
with open('1.json','w') as f:f.write(json.dumps(res,ensure_ascii=False))

第二道爬取所有图片

from urllib.parse import urljoin
import requests,re
from urllib.parse import urlparse
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}res = requests.get(url,headers=headers).json()
list1 = res['articles']
list2=[]
for i in list1:list2.append(i['image'])
base_url ="https://"+urlparse(url).netlocfor image in list2:image_url = urljoin(base_url,image)img = requests.get(image_url).contentimg_name = image.split("/")[-1]with open(img_name,'wb') as f:f.write(img)

第三道 爬取题目和摘要

import requests,csv
from lxml import etree
with open("data.csv","w",newline='',encoding='gbk') as f:writer = csv.writer(f)writer.writerow(["题目","再要"])
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/article/list/"
headers= {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}res = requests.get(url,headers=headers)
html = etree.HTML(res.text)
wen_zhang = html.xpath('//div[@class="lab-block"]//a//@href')
with open("data.csv","w",newline='',encoding='gbk') as f:writer = csv.writer(f)writer.writerow(["ti","zai"])for i in wen_zhang:url_l = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/"+iresult = requests.get(url_l,headers=headers)select = etree.HTML(result.text)timu = select.xpath('//h2/text()')[0]zaiyao = select.xpath('//p//text()')result = "".join(zaiyao)with open("data.csv", "a", newline='',encoding='utf-8') as f:writer = csv.writer(f)writer.writerow([timu, result])

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com