2020高考院校库下载保存excel

成果展示(篇幅原因,展示部分,全国共两千多所高校):
【2020高考院校库下载保存excel】

2020高考院校库下载保存excel

这是原网页数据:
2020高考院校库下载保存excel


思路:
查看网页源码发现为固定数据,非异步请求,所以呢就直接构造连接了


2020高考院校库下载保存excel

通过对比发现需要构造处就是红框部分,依次增加20
使用xpath获取表格类数据比较方便
源码:
import requestsfrom lxml import etreeimport openpyxltitle = ['院校名称', '院校所在地', '教育主管部门', '院校类型', '学历层次', '满意度']workbook = openpyxl.Workbook()sheet = workbook.worksheets[0]sheet.append(title)def writefile(school, destination, party, schooltype, floattype, score):for i in range(len(school)):sheet.append([school[i], destination[i], party[i], schooltype[i], floattype[i], score[i]])def replacet(who):for i in range(len(who)):who[i] = who[i].replace(' ', '').replace('n', '')return whodef get(url):headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/78.0.3904.97 Safari/537.36", }response = requests.get(url, headers=headers).texthtml = etree.HTML(response, etree.HTMLParser())school = html.xpath('//div/table/tr/td[1]/a/text()')destination = html.xpath('//div/table/tr/td[2]/text()')party = html.xpath('//div/table/tr/td[3]/text()')schooltype = html.xpath('//div/table/tr/td[4]/text()')floattype = html.xpath('//div/table/tr/td[5]/text()')score = html.xpath('//div/table/tr/td[9]/a/text()')school = replacet(school)destination = replacet(destination)party = replacet(party)schooltype = replacet(schooltype)floattype = replacet(floattype)score = replacet(score)writefile(school, destination, party, schooltype, floattype, score)if __name__ == '__main__':for p in range(0, 2820, 20):print('第{}个开始'.format(p))try:get('https://gaokao.chsi.com.cn/sch/search--ss-on,searchType-1,option-qg,start-{}.dhtml'.format(p))print('第{}个保存完成'.format(p))except:print('第{}个保存失败'.format(p))workbook.save('2020高考高校信息库.xlsx')workbook.close()
完成!

相关经验推荐