将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

之前在写文档在线预览时留下了一个小坑,当时比较推荐的做法是将各种类型的文档都由后端统一转成pdf格式再由前端进行展示 , 但是当时并没有提供将各种类型的文档转pdf的方法,这次就来填一下这个坑 。

事前准备

代码基于 aspose-words(用于word、txt转pdf) , itextpdf(用于ppt、图片、excel转pdf) , 所以事先需要在项目里下面以下依赖

1、需要的maven依赖

com.luhuiguoaspose-words23.1org.apache.poipoi5.2.0org.apache.poipoi-ooxml5.2.0org.apache.poipoi-scratchpad5.2.0org.apache.poipoi-excelant5.2.0com.itextpdfitextpdf5.5.13.2com.itextpdfitext-asian5.2.0

2、后面用到的工具类代码:

package com.fhey.service.common.utils.file;import cn.hutool.core.util.StrUtil;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.io.File;import java.io.FileInputStream;import java.io.IOException;/** * @author fhey * @date 2023-04-20 11:15:58 * @description: 文件工具类 */public class FileUtil {private static final Logger logger = LoggerFactory.getLogger(FileUtil.class);//获取新文件的全路径public static String getNewFileFullPath(String sourceFilePath, String destFilePath, String ext) {File destFile = new File(destFilePath);if (destFile.isFile()) {return destFilePath;}File sourceFile = new File(sourceFilePath);String sourceFileName = sourceFile.getName();if (sourceFile.isFile()) {return destFilePathFile.separatorsourceFileName.substring(0, sourceFileName.lastIndexOf(StrUtil.DOT))StrUtil.DOText;}return destFilePathFile.separatorsourceFileNameStrUtil.DOText;}//判断文件是否是图片public static boolean isImage(File file) throws IOException {FileInputStream is = new FileInputStream(file);byte[] bytes = new byte[8];is.read(bytes);is.close();String type = bytesToHexString(bytes).toUpperCase();if (type.contains("FFD8FF") //JPEG(jpg)|| type.contains("89504E47") //PNG|| type.contains("47494638") //GIF|| type.contains("49492A00") //TIFF(tif)|| type.contains("424D") //Bitmap(bmp)) {return true;}return false;}//将文件头转换成16进制字符串public static String bytesToHexString(byte[] src) {StringBuilder builder = new StringBuilder();if (src =https://www.itzhengshu.com/ppt/= null || src.length <= 0) {return null;}for (int i = 0; i < src.length; i) {int v = src[i] & 0xFF;String hv = Integer.toHexString(v);if (hv.length() < 2) {builder.append(0);}builder.append(hv);}return builder.toString();}}

一、word文件转pdf文件(支持doc、docx)

word转pdf的方法比较简单,aspose-words基本都被帮我们搞定了,doc、docx都能支持 。
代码:
public static void wordToPdf(String wordPath, String pdfPath) throws Exception {pdfPath = FileUtil.getNewFileFullPath(wordPath, pdfPath, "pdf");File file = new File(pdfPath);FileOutputStream os = new FileOutputStream(file);Document doc = new Document(wordPath);doc.save(os, com.aspose.words.SaveFormat.PDF);}
验证代码:
public static void main(String[] args) throws Exception {wordToPdf("D:\书籍\电子书\其它\《山海经》异兽图.docx", "D:\test");}
转换效果如下,格式、图文都没什么问题,doc、docx经过验证也都能转换成功
将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

二、txt文件转pdf文件

txt文件转pdf文件代码直接复用word的即可
代码:
public static void txtToPdf(String txtPath, String pdfPath) throws Exception {wordToPdf(txtPath, pdfPath);}
验证代码:
public static void main(String[] args) throws Exception {txtToPdf("D:\书籍\电子书\国外名著\君主论.txt", "D:\test");}
转换效果如下
将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

三、PPT文件转pdf文件(支持ppt、pptx)

PPT文件转pdf文件,听说你们公司不让用ppt,那就让我们把ppt转成pdf再用吧 。其实从这里开始代码就开始复杂起来了,这里用到了Apache poi、itextpdf、Graphics2D三个库,于是我结合这三个库同时兼容ppt、pptx写出了第一版代码

ppt转pdf第一版代码

public static void pptToPdf(String pptPath, String pdfPath) throws IOException {pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");com.itextpdf.text.Document document = null;FileOutputStream fileOutputStream = null;PdfWriter pdfWriter = null;try {InputStream inputStream = Files.newInputStream(Paths.get(pptPath));SlideShow slideShow;String ext = pptPath.substring(pptPath.lastIndexOf("."));if (ext.equals(".pptx")) {slideShow = new XMLSlideShow(inputStream);} else {slideShow = new HSLFSlideShow(inputStream);}Dimension dimension = slideShow.getPageSize();fileOutputStream = new FileOutputStream(pdfPath);//document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));document = new com.itextpdf.text.Document();pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);image.scaleToFit((float) dimension.getWidth(), (float) dimension.getHeight());document.add(image);document.newPage();}} catch (Exception e) {e.printStackTrace();} finally {try {if (document != null) {document.close();}if (fileOutputStream != null) {fileOutputStream.close();}if (pdfWriter != null) {pdfWriter.close();}} catch (IOException e) {e.printStackTrace();}}}private static void setPPTFont(Slide slide, String fontFamily) {// 设置字体, 解决中文乱码for (Shape shape : slide.getShapes()) {if (shape instanceof TextShape) {TextShape textShape = (TextShape) shape;List textParagraphs = textShape.getTextParagraphs();for (TextParagraph textParagraph : textParagraphs) {List textRuns = textParagraph.getTextRuns();for (TextRun textRun : textRuns) {textRun.setFontFamily(fontFamily);}}}}}
验证代码:
public static void main(String[] args) throws Exception {pptToPdf("C:\Users\jie\Desktop\预览\web\files\河西走廊见闻录.pptx", "D:\test");}
转换效果如下
将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

可以看到转换效果并不怎么好,ppt的内容展示不全 。于是我开始在网上找解决方案,结果找到了一个很神奇的解决方案 , 就绘制的图片先写在一个PdfPTable对象上,再把PdfPTable对象放到document离去,于是我根据这个改了改代码写出了第二版代码

ppt转pdf第二版代码

public static void pptToPdf(String pptPath, String pdfPath) throws IOException {pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");com.itextpdf.text.Document document = null;FileOutputStream fileOutputStream = null;PdfWriter pdfWriter = null;try {InputStream inputStream = Files.newInputStream(Paths.get(pptPath));SlideShow slideShow;String ext = pptPath.substring(pptPath.lastIndexOf("."));if (ext.equals(".pptx")) {slideShow = new XMLSlideShow(inputStream);} else {slideShow = new HSLFSlideShow(inputStream);}Dimension dimension = slideShow.getPageSize();fileOutputStream = new FileOutputStream(pdfPath);//document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));document = new com.itextpdf.text.Document();pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();PdfPTable pdfPTable = new PdfPTable(1);for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);image.scaleToFit((float) dimension.getWidth(), (float) dimension.getHeight());// 写入单元格pdfPTable.addCell(new PdfPCell(image, true));document.add(pdfPTable);pdfPTable.deleteBodyRows();document.newPage();}} catch (Exception e) {e.printStackTrace();} finally {try {if (document != null) {document.close();}if (fileOutputStream != null) {fileOutputStream.close();}if (pdfWriter != null) {pdfWriter.close();}} catch (IOException e) {e.printStackTrace();}}}
转换效果如下
将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

可以看到ppt内容已经展示完整了,到此其实ppt转pdf功能已经基本实现了,但是显示效果依然不算完美毕竟我们其实想要的是在pdf里和在ppt看的是一样的效果,而且每页ppt的长宽其实都是一样的,所以我就在想能不能设置pdf每页的长宽,把pdf每页的长宽设置成和ppt的长宽一样 。于是我开始看初始化pdf document的源码配置
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
然后发现com.itextpdf.text.Document除了默认的构造函数外还有这这样一个构造函数:
public Document(Rectangle pageSize) {this(pageSize, 36.0F, 36.0F, 36.0F, 36.0F);}
然后com.itextpdf.text.Rectangle类点进去就发现了可以设置长宽的构造函数:
public Rectangle(float urx, float ury) {this(0.0F, 0.0F, urx, ury);}
于是我代码中的初始化Document进行如下调整(根据第一版代码改,第二版的PdfPTable可以不用了)
document = new com.itextpdf.text.Document();//改成如下document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));

ppt转pdf第三版代码(最终版)

public static void pptToPdf(String pptPath, String pdfPath) throws IOException {pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");com.itextpdf.text.Document document = null;FileOutputStream fileOutputStream = null;PdfWriter pdfWriter = null;try {InputStream inputStream = Files.newInputStream(Paths.get(pptPath));SlideShow slideShow;String ext = pptPath.substring(pptPath.lastIndexOf("."));if (ext.equals(".pptx")) {slideShow = new XMLSlideShow(inputStream);} else {slideShow = new HSLFSlideShow(inputStream);}Dimension dimension = slideShow.getPageSize();fileOutputStream = new FileOutputStream(pdfPath);//document = new com.itextpdf.text.Document();document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);image.scaleToFit((float) dimension.getWidth(), (float) dimension.getHeight());document.add(image);document.newPage();}} catch (Exception e) {e.printStackTrace();} finally {try {if (document != null) {document.close();}if (fileOutputStream != null) {fileOutputStream.close();}if (pdfWriter != null) {pdfWriter.close();}} catch (IOException e) {e.printStackTrace();}}}
转换效果如下
将word、txt、ppt、excel、图片等文件转成pdf文件实现方式

现在展示的效果已经和ppt上一样了,而且经过验证ppt和pptx都是可以转换成功的 。

四、图片转pdf文件

图片转pdf用到了用到了Apache poi、itextpdf两个库,因为itextpdf支持解析的图片有限,点开c读取图片的方法com.itextpdf.text.Image.

相关经验推荐