Python实现搜索GoogleScholar论文信息的示例代码

Python搜索Google Scholar论文信息 Python搜索论文信息 Python Google 2023-03-06 11:03:27 127人浏览泡泡鱼

Python 官方文档：入门教程 => 点击学习

摘要

示例数据示例代码 import requests from bs4 import BeautifulSoup from tqdm import tqdm from pybtex.

示例数据

示例代码

import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
from pybtex.database import BibliographyData, Entry
from pybtex.database.input import bibtex
import pandas as pd
import time
import JSON
 
import random
 
 
def search_doi(doi):
    '''根据doi查论文详细信息'''
    url = f'https://api.crossref.org/works/{doi}'
    response = requests.get(url)
    result = None
    if response.status_code == 200:
        result = response.json()['message']
    else:
        print('Error occurred')
    return result
 
# doi = 'Https://dl.acm.org/doi/abs/10.1145/3394486.3403237'
# result = search_doi(doi)
# print(f"Title: {result['title'][0]}:{result['subtitle'][0]}")
# print(f"Author(s): {', '.join(author['given'] + ' ' + author['family'] for author in result['author'])}")
# print(f"Journal: {result['container-title'][0]}")
# print(f"Publication Date: {result['published-print']['date-parts'][0][0]}")
 
 
def search_cite(atid):
    '''根据atid查cite'''
    url = f'https://scholar.Google.com/scholar?q=info:{atid}:scholar.google.com/&output=cite&scirp=8&hl=zh-CN'
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, 'lxml')
    result = {}
    for item in soup.find_all('tr'):
        cith = item.find('th', class_='gs_cith').getText()
        citr = item.find('div', class_='gs_citr').getText()
        result[cith] = citr
    return result
 
# result = search_cite('_goqYZv1zjMJ')
# print(result)
 
 
 
# 更改节点配置
def change_clash_node(node_name=None):
    # Clash API的URL和密码
    url = 'http://127.0.0.1:15043/proxies/?国外流量'
    passWord = 'ee735f4e-59c6-4d60-a2ad-aabd075badb2'
    local_node_name = ['香港1-IEPL-倍率1.0', '香港2-IEPL-倍率1.0', '香港3-IEPL-倍率1.0', 
                       '台湾1-IEPL-倍率1.0', '台湾2-IEPL-倍率1.0', '台湾3-IEPL-倍率1.0',
                       '新加坡1-IEPL-倍率1.0', '新加坡2-IEPL-倍率1.0', '新加坡3-IEPL-倍率1.0'
                       ]
    node_name = node_name or random.choice(local_node_name)
    print(f'当前选择节点名称: {node_name}')
    
    headers = {'Authorization': password}
    data = {
        'name': 'Rule',
        'type': 'Selector',
        'now': node_name
    }
    response = requests.put(url, headers=headers, json=data)
    if response.status_code == 200:
        print('节点已更改为：', node_name)
    else:
        print('更改节点时出错：', response.text)
 
# 更改节点为my_node
# change_clash_node()
 
 
 
def proxy_requests(url):
    proxies = {
        'http': 'socks5://127.0.0.1:7890',
        'https': 'socks5://127.0.0.1:7890'
    }
    return requests.get(url, proxies=proxies)
 
 
def search(title='GNN', start=0):
    url = f'https://scholar.google.com/scholar?start={start}&q=allintitle:+{title}&hl=zh-CN&as_sdt=0,5'
    resp = proxy_requests(url)
    soup = BeautifulSoup(resp.text, 'lxml')
    try:
        papers_item = soup.find(id='gs_res_ccl_mid').find_all('div', class_='gs_scl')
    except:
        print(soup)
        if 'captcha-fORM' in soup:
            return -1
    papers_info = []
    for paper in papers_item:
        publisher = paper.find('div', class_='gs_or_ggsm').getText().split()[1].split('.')[0]
        href = paper.find('h3', class_='gs_rt').find('a').get('href')
        title = paper.find('h3', class_='gs_rt').find('a').getText()
        detail = paper.find('div', class_='gs_ri').find('div', class_='gs_a').getText()
        year = detail.split(',')[-1].strip()[:4]
        
        # atid = paper.find('h3', class_='gs_rt').find('a').get('data-clk-atid')
        # cite_info = search_cite(atid)['MLA']
        # cite_info_filter = list(filter(lambda x:x, map(lambda x:x.strip().strip('"').strip(), cite_info.strip().split('.'))))
        # author, title, publisher, year = cite_info_filter
        
        papers_info.append({'title':title, 'year':year, 'publisher':publisher, 'href':href})
    return papers_info
 
 
 
 
index_start = 0
index_end = 500
index_gap = 10
papers_store = []
bar = tqdm(total=index_end-index_start, desc=f'From {index_start} to {index_end}')
# for start in range(index_start, index_end, index_gap):
while index_start < index_end:
    try:
        papers_info = search(title='GNN', start=index_start)
        if papers_info == -1:
            print('需要验证码，更换节点后2秒内重试')
            change_clash_node()
            time.sleep(2)
            continue
        papers_store.extend(papers_info)
    except AttributeError as e:
        print(e)
        break
        
    index_start += index_gap
    bar.update(index_gap)
    bar.refresh()
    time.sleep(0.1)
bar.close()
 
df = pd.DataFrame(papers_info)
print(df)
df.to_csv('data.csv', index=False)

以上就是python实现搜索Google Scholar论文信息的示例代码的详细内容，更多关于Python搜索Google Scholar论文信息的资料请关注编程网其它相关文章！

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: Python实现搜索GoogleScholar论文信息的示例代码

本文链接: https://www.lsjlt.com/news/198589.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

本篇文章演示代码以及资料文档资料下载

下载Word文档到电脑，方便收藏和打印～

下载Word文档

去做题

猜你喜欢

Python实现搜索GoogleScholar论文信息的示例代码

示例数据示例代码 import requests from bs4 import BeautifulSoup from tqdm import tqdm from pybtex....

99+

2023-03-06

Python搜索Google Scholar论文信息 Python搜索论文信息 Python Google
Python如何实现搜索Google Scholar论文信息

本篇内容介绍了“Python如何实现搜索Google Scholar论文信息”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！示例数据...

99+

2023-07-05
Python实现计算信息熵的示例代码

目录一：数据集准备二：信息熵计算三：完整源码分享一：数据集准备如博主使用的是：多层感知机(MLP)实现考勤预测二分类任务(sklearn)对应数据集导入至工程下二：信息熵计...

99+

2022-12-26

Python计算信息熵 Python 信息熵
Python通过tkinter实现百度搜索的示例代码

本文主要介绍了Python通过tkinter实现百度搜索的示例代码，分享给大家，具体如下： """ 百度搜索可视化 """ import tkinter import win...

99+

2022-11-12
Java实现二分搜索树的示例代码

目录1.概念2.重点操作3.完整代码1.概念 a.是个二叉树(每个节点最多有两个子节点) b.对于这棵树中的节点的节点值左子树中的所有节点值 < 根节点 < 右子树的所...

99+

2022-11-13
Vue实现简单搜索功能的示例代码

目录1、概述2、功能逻辑2.1功能流程2.2 流程图3、功能实现3.1 vue组件化3.2 代码3.3 动态效果1、概述在vue项目中，搜索功能是我们经常需要使用的一个场景，最常用...

99+

2023-03-19

Vue实现搜索功能 Vue搜索功能 Vue搜索
微信小程序实现搜索关键词高亮的示例代码

1，前言项目中碰到一个需求，搜索数据并且关键词要高亮显示，接到需求，马上开干。先上效果图。源码已经做成了小程序代码片段，放入了GitHub了，文章底部有源码链接。 2，思路 ...

99+

2022-11-12
Python实现学生信息管理系统的示例代码

目录前言正文一、新手小白的福利——零基础学生信息管理系统二、GUI界面化版本——Tkinter学生信息管理系统前言夏天是用来告别的季...

99+

2023-02-15

Python实现学生信息管理系统 Python学生信息管理系统 Python信息管理系统
WPF实现带模糊搜索的DataGrid的示例代码

目录带模糊搜索的DataGrid前端代码 view后端代码 ViewModel带模糊搜索的DataGrid 前端代码 view <Window x:Class="MVV...

99+

2023-02-16

WPF 模糊搜索DataGrid WPF 搜索DataGrid WPF DataGrid
Python10行代码实现模拟百度搜索的示例

目录1. 获取百度搜索接口2. 指定搜索内容3. UA伪装4. 将响应内容写入文件5. 使用浏览器打开页面1000块钱做个百度？能提出这种要求的客户实乃乙方克星、民族之光、科创永动机...

99+

2022-11-11
Mysql实现简易版搜索引擎的示例代码

目录前言简介ngram 全文解析器创建全文索引检索方式1、自然语言检索（NATURAL LANGUAGE MODE）2、布尔检索（BOOLEAN MODE）与 Like 对比总结前言...

99+

2022-11-12
Flutter实现自定义搜索框AppBar的示例代码

目录介绍效果图实现步骤完整源码总结介绍开发中，页面头部为搜索样式的设计非常常见，为了可以像系统AppBar那样使用，这篇文章记录下在Flutter中自定义一个通用的搜索框AppBa...

99+

2022-11-13
python实现搜索本地文件信息写入文件的方法

本文实例讲述了python实现搜索本地文件信息写入文件的方法。分享给大家供大家参考，具体如下：主要功能：在指定的盘符，如D盘，搜索出与用户给定后缀名(如：jpg,png)相关的文件，然后把搜索出来的信息...

99+

2022-06-04

本地文件文件方法
Python实现自动添加脚本头信息的示例代码

前言每个人写脚本时的格式都会有所不同，有的会注明脚本本身的一些信息，有的则开门见山，这在小团队里其实没什么，基本别人做什么你也都知道，但如果放到大的团队就比较麻烦了，因为随着人数的增多，脚本成指数增长，如...

99+

2022-06-04

示例脚本代码
Python实现自定义异常堆栈信息的示例代码

当我们的程序报错时，解释器会将整个异常的堆栈信息全部输出出来，举个例子： def foo(): raise RuntimeError("抛一个异常") def bar():...

99+

2022-11-11
13行python代码实现对微信进行推送消息的示例代码

目录单人推送一对多推送Python可以实现给QQ邮箱、企业微信、微信等等软件推送消息，今天咱们实现一下Python直接给微信推送消息。这里咱们使用了一个第三方工具pushplus ...

99+

2022-11-11
Python实现APP自动化发微信群消息的示例代码

目录1. 前言2. 爬虫及服务3. 自动化发送群聊4. 最后1. 前言但是对于很多人来说，首先编写一款 App 需要一定的移动端开发经验，其次还需要另外编写无障碍服务应用，如此显...

99+

2022-11-12
vue elementui 实现搜索栏子组件封装的示例代码

目录前言需求实现子组件(search.vue)父组件部分主要代码（index.vue）前言描述：在基本项目中搜索栏、分页组件格式样式几乎是固定的，只是对应的数据不同，由于模块会随...

99+

2022-11-13
Java实现文件检索系统的示例代码

示例代码 package Demo; import java.io.File; import java.io.FilenameFilter; import java.util.Sc...

99+

2022-11-13
python实现socket简单通信的示例代码

首先先来简单介绍下socket：（具体更详细介绍的可以在网上找找，都讲得非常详细)，这里主要是我自己的一些理解。 socket是在应用层与传输层之间的一个抽象层，它的本质是编程接...

99+

2022-11-12