在当今科技快速发展的时代,PDF文件已经成为许多人工作生活中不可或缺的一部分 。然而,对PDF文档进行分析和翻译仍然是一个相当繁琐的过程 。但现在,随着ChatGPT技术的出现,这个过程变得更加高效和准确了 。
ChatGPT是一项先进的人工智能技术,可以通过深度学习和自然语言处理来理解和解析PDF文档 。使用ChatGPT进行PDF分析和翻译的主要好处之一是高速和精准度 。与传统的OCR方法相比,ChatGPT不仅可以识别文本,还可以识别和理解图像和表格内容,并输出高质量的分析结果 。
而且 , ChatGPT还可以进行翻译 , 将PDF文档从一种语言翻译到另一种语言 。这项功能对于那些需要处理多种语言的项目或文档的人来说非常有用,也可以帮助跨国公司更轻松地处理不同国别的客户或业务伙伴 。
总之,ChatGPT的出现带来了PDF分析和翻译方面的变革,并且将继续推动这个领域的发展 。
用ChatGPT分析PDF,已经有一些商业产品推出了 。今天来测试下 , 首先随机找了一篇最新发表的论文,用ChatGPT来辅助分析一下论文 。



看了下结果,分析的也是简单的文字整合,显然没有真正读懂论文,也没有给出详细的分析 。问了点难点的具体细节,直接卡壳了 。
这类利用ChatGPT的商用网站,说白了还是调用的GPT API,输出结果还都是GPT给出的回答 。商用网站做的工作是设计一些交互和辅助一些提示词 。
下面分析下实现的源码,以翻译PDF为例,看看是怎么实现的 。(分析PDF类似,只是提示词从翻译改成分析,解释等)

互联网上找一个pdf文件,这里就用了openai官网的gpt-4的文档作为示例 。用requests库 , 将pdf文件下载下来 。

from pypdf import PdfReaderreader = PdfReader("gpt-4.pdf")number_of_pages = len(reader.pages)page = reader.pages[0]text = page.extract_text()textlanguage-python复制代码
‘GPT-4 Technical ReportnOpenAIx03nAbstractnWe report the development of GPT-4, a large-scale, multimodal model which cannaccept image and text inputs and produce text outputs. While less capable thannhumans in many real-world scenarios, GPT-4 exhibits human-level performancenon various professional and academic benchmarks, including passing a simulatednbar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-nbased model pre-trained to predict the next token in a document. The post-trainingnalignment process results in improved performance on measures of factuality andnadherence to desired behavior. A core component of this project was developingninfrastructure and optimization methods that behave predictably across a widenrange of scales. This allowed us to accurately predict some aspects of GPT-4’snperformance based on models trained with no more than 1/1,000th the compute ofnGPT-4.n1 IntroductionnThis technical report presents GPT-4, a large multimodal model capable of processing image andntext inputs and producing text outputs. Such models are an important area of study as they have thenpotential to be used in a wide range of applications, such as dialogue systems, text summarization,nand machine translation. As such, they have been the subject of substantial interest and progress innrecent years [1–34].nOne of the main goals of developing such models is to improve their ability to understand and generatennatural language text, particularly in more complex and nuanced scenarios. To test its capabilitiesnin such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans. Innthese evaluations it performs quite well and often outscores the vast majority of human test takers.nFor example, on a simulated bar exam, GPT-4 achieves a score that falls in the top 10% of test takers.nThis contrasts with GPT-3.5, which scores in the bottom 10%.nOn a suite of traditional NLP benchmarks, GPT-4 outperforms both previous large language modelsnand most state-of-the-art systems (which often have benchmark-speci?c training or hand-engineering).nOn the MMLU benchmark [ 35,36], an English-language suite of multiple-choice questions coveringn57 subjects, GPT-4 not only outperforms existing models by a considerable margin in English, butnalso demonstrates strong performance in other languages. On translated variants of MMLU, GPT-4nsurpasses the English-language state-of-the-art in 24 of 26 languages considered. We discuss thesenmodel capability results, as well as model safety improvements and results, in more detail in laternsections.nThis report also discusses a key challenge of the project, developing deep learning infrastructure andnoptimization methods that behave predictably across a wide range of scales. This allowed us to makenpredictions about the expected performance of GPT-4 (based on small runs trained in similar ways)nthat were tested against the ?nal run to increase con?dence in our training.nDespite its capabilities, GPT-4 has similar limitations to earlier GPT models [ 1,37,38]: it is not fullynreliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learnnx03Please cite this work as “OpenAI (2023)”. Full authorship contribution statements appear at the end of thendocument. Correspondence regarding this technical report can be sent to gpt4-report@openai.comarXiv:submit/4812508 [cs.CL] 27 Mar 2023’

本来这里就可以调用GPT API进行处理了,但是GPT有单词输入文字的最大限制 。所以要对pdf里的文字切分 。等比例切分会中断句子 。这里用了自然语言处理NLTK库,按句子的意思进行切分 。

sentences = sent_tokenize(text)for sentence in sentences:print(sentence)print('=' * 20)language-python复制代码
调用GPT API对分段的文字进行翻译

import openaiopenai.api_key = TOKENcompletion = openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=[{"role": "system", "content": "请你成为文章翻译的小帮手,请协助翻译以下技术文件,以简体中文输出"},{"role": "user", "content": sentences[0]},])completion.choices[0].message.contentlanguage-python复制代码
‘GPT-4技术报告nOpenAIn摘要n本文报告了GPT-4的开发情况,这是一个大规模的、多模态模型,能够接受图像和文本输入,并生成文本输出 。’
input_sentences = ''chunks = []for sentence in sentences:input_sentences= sentenceif len(input_sentences) > 1000:chunks.append(input_sentences)input_sentences = ''chunks.append(input_sentences)chunkslanguage-python复制代码
[‘GPT-4 Technical ReportnOpenAIx03nAbstractnWe report the development of GPT-4, a large-scale, multimodal model which cannaccept image and text inputs and produce text outputs.While less capable thannhumans in many real-world scenarios, GPT-4 exhibits human-level performancenon various professional and academic benchmarks, including passing a simulatednbar exam with a score around the top 10% of test takers.GPT-4 is a Transformer-nbased model pre-trained to predict the next token in a document.The post-trainingnalignment process results in improved performance on measures of factuality andnadherence to desired behavior.A core component of this project was developingninfrastructure and optimization methods that behave predictably across a widenrange of scales.This allowed us to accurately predict some aspects of GPT-4’snperformance based on models trained with no more than 1/1,000th the compute ofnGPT-4.1 IntroductionnThis technical report presents GPT-4, a large multimodal model capable of processing image andntext inputs and producing text outputs.’,
‘Such models are an important area of study as they have thenpotential to be used in a wide range of applications, such as dialogue systems, text summarization,nand machine translation.As such, they have been the subject of substantial interest and progress innrecent years [1–34].One of the main goals of developing such models is to improve their ability to understand and generatennatural language text, particularly in more complex and nuanced scenarios.To test its capabilitiesnin such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans.Innthese evaluations it performs quite well and often outscores the vast majority of human test takers.For example, on a simulated bar exam, GPT-4 achieves a score that falls in the top 10% of test takers.This contrasts with GPT-3.5, which scores in the bottom 10%.On a suite of traditional NLP benchmarks, GPT-4 outperforms both previous large language modelsnand most state-of-the-art systems (which often have benchmark-speci?c training or hand-engineering).’,
‘On the MMLU benchmark [ 35,36], an English-language suite of multiple-choice questions coveringn57 subjects, GPT-4 not only outperforms existing models by a considerable margin in English, butnalso demonstrates strong performance in other languages.On translated variants of MMLU, GPT-4nsurpasses the English-language state-of-the-art in 24 of 26 languages considered.We discuss thesenmodel capability results, as well as model safety improvements and results, in more detail in laternsections.This report also discusses a key challenge of the project, developing deep learning infrastructure andnoptimization methods that behave predictably across a wide range of scales.This allowed us to makenpredictions about the expected performance of GPT-4 (based on small runs trained in similar ways)nthat were tested against the ?nal run to increase con?dence in our training.Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [ 1,37,38]: it is not fullynreliable (e.g.can suffer from “hallucinations”), has a limited context window, and does not learnnx03Please cite this work as “OpenAI (2023)”.’,
‘Full authorship contribution statements appear at the end of thendocument.Correspondence regarding this technical report can be sent to gpt4-report@openai.comarXiv:submit/4812508 [cs.CL] 27 Mar 2023’]
completion = openai.ChatCompletion.create(
{“role”: “system”, “content”: “请你成为文章翻译的小帮手,请协助翻译以下技术文件,以简体中文输出”},
{“role”: “user”, “content”: chunks[0]},
‘GPT-4技术报告nOpenAIn摘要n本文报道了GPT-4的开发情况,它是一个可接收图像和文本输入并生成文本输出的大规模多模态模型 。虽然在许多实际情境中比人类能力差 , 但在各种专业和学术基准测试中 , GPT-4表现出人类水平的性能 , 包括在模拟的律师考试中获得了约是前10%考生的成绩 。GPT-4是基于Transformer的模型 , 预先训练以预测文档中的下一个标记 。后训练对齐过程提高了其实际性能和符合所需行为的程度 。该项目的核心组件是开发基础设施和优化方法,可在各种规模上可靠地预测GPT-4的某些方面性能,其中包括了通过使用不超过GPT-4计算量的1/1000的模型进行训练 。n1介绍n本技术报告介绍了GPT-4,它是一个大规模多模态模型,能够处理图像和文本输入并生成文本输出 。’

下面的代码将上面的所有分片的代码拼在在一起,完成对完整的pdf的翻译 。

from pypdf import PdfReaderfrom nltk.tokenize import sent_tokenizepdf_name = "gpt-4.pdf" reader = PdfReader(pdf_name)number_of_pages = len(reader.pages)chunks = []for i in range(number_of_pages):page = reader.pages[i]text = page.extract_text()sentences = sent_tokenize(text)input_sentences = ''for sentence in sentences:input_sentences= sentenceif len(input_sentences) > 1000:chunks.append(input_sentences)input_sentences = ''chunks.append(input_sentences)for i in range(10):completion = openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=[{"role": "system", "content": "请你成为文章翻译的小帮手 , 请协助翻译以下技术文件,以简体中文输出"},{"role": "user", "content": chunks[i]},])print('原文:', chunks[i])print('翻译结果:',completion.choices[0].message.content)language-python复制代码
