实现视频爬取, 自动评论, 自动点赞

一、[知识点]:

   动态数据抓包

   requests发送请求

   json数据解析

二、[开发环境]:

   python 3.8               运行代码

   pycharm 2021.2           辅助敲代码

   requests                 pip install requests

三、爬虫案例:

   采集快手短视频网站视频

   分析数据来源

​    https://www.kuaishou.com/graphql​

四、实现代码:

   1. 发送请求

   2. 获取数据

   3. 解析数据

   4. 保存数据

爬虫:

   模拟成 浏览器 向 服务器 发送请求

五、完整代码

import requests     # 发送请求 第三方模块
import re

headers = {
'content-type': 'application/json',
# 用户信息
'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_d3f9d8c2cbebafd126b80eb0b1c13360; client_key=65890b29; didv=1658130458000; userId=270932146; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABCj1Pe61TcGTRmOxDP2F7J-5buR1I6zTbr2o8VylTwBIilBXkjnTbXau3z8OK1r-i0YIefozg8oheW-VO5_33SX0PmlNy5A8bmqSsJXZocyw3CusEfPPuVrgD6zZlzHSqW-M7GKTSptfCJ6of43qs700fYxwy-yrx13---JA62jliXOadl2OOT9f_A7W7DdIhT8rMQtFFdodh_frGf3CyBhoSoJCKbxHIWXjzVWap_gGna5KjIiB6FJHOKt3vnbSSWhl2W0DWrtjoA1X_lW9zlGlRaYHPkSgFMAE; kuaishou.server.web_ph=7ee6499c7437971b1182aa3bb1ba1c645b9f',
# 域名
'Host': 'www.kuaishou.com',
'Origin': 'https://www.kuaishou.com',
# 防盗链
'Referer': 'https://www.kuaishou.com/profile/3xsxdmstbwbx4ba',
# 浏览器基本信息
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
}
url = 'https://www.kuaishou.com/graphql'

def get_page(pcursor):
json = {
'operationName': "visionProfilePhotoList",
'query': "fragment photoContent on PhotoEntity {n idn durationn captionn likeCountn viewCountn realLikeCountn coverUrln photoUrln photoH265Urln manifestn manifestH265n videoResourcen coverUrls {n urln __typenamen }n timestampn expTagn animatedCoverUrln distancen videoRation likedn stereoTypen profileUserTopPhoton __typenamen}nnfragment feedContent on Feed {n typen author {n idn namen headerUrln followingn headerUrls {n urln __typenamen }n __typenamen }n photo {n ...photoContentn __typenamen }n canAddCommentn llsidn statusn currentPcursorn __typenamen}nnquery visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {n visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {n resultn llsidn webPageArean feeds {n ...feedContentn __typenamen }n hostNamen pcursorn __typenamen }n}n",
'variables': {'userId': "3xhv7zhkfr3rqag", 'pcursor': pcursor, 'page': "profile"}
}
# 1. 发送请求
response = requests.post(url=url, headers=headers, json=json)
# 2. 获取数据
json_data = response.json()
# 3. 解析数据
feeds = json_data['data']['visionProfilePhotoList']['feeds']
pcursor = json_data['data']['visionProfilePhotoList']['pcursor']
for feed in feeds:
caption = feed['photo']['caption']
photoUrl = feed['photo']['photoUrl']
print(caption, photoUrl)
photoAuthorId = feed['author']['id']
photoId = feed['photo']['id']
json_1 = {
'operationName': "visionVideoLike",
'query': "mutation visionVideoLike($photoId: String, $photoAuthorId: String, $cancel: Int, $expTag: String) {n visionVideoLike(photoId: $photoId, photoAuthorId: $photoAuthorId, cancel: $cancel, expTag: $expTag) {n resultn __typenamen }n}n",
'variables': {
'cancel': 0,
'expTag': "1_i/2005282647926093489_xpcwebprofilexxnull0",
'photoAuthorId': photoAuthorId,
'photoId': photoId
}
}
requests.post(url=url, headers=headers, json=json_1)
# caption = re.sub('[\/:"<>|*\n]', '', caption)
# # 4. 保存数据
# video_data = requests.get(photoUrl).content
# with open(f'video/{caption}.mp4', mode='wb') as f:
# f.write(video_data)
if pcursor == 'no_more':
return 0
get_page(pcursor)

get_page("")
发表评论

相关文章