三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

之前在写文档在线预览时留下了一个小坑,当时比较推荐的做法是将各种类型的文档都由后端统一转成pdf格式再由前端进行展示,但是当时并没有提供将各种类型的文档转pdf的方法,这次就来填一下这个坑 。
前端在线预览pdf文件的实现方式可以参考这篇文章: 。

事前准备

代码基于 aspose-words(用于word、txt转pdf),itextpdf(用于ppt、图片、excel转pdf),poi(用于word转pdf) , spire(用于word、excel转pdf)所以事先需要在项目里下面以下依赖

1、需要的maven依赖

com.luhuiguoaspose-words23.1org.apache.poipoi5.2.0org.apache.poipoi-ooxml5.2.0org.apache.poipoi-scratchpad5.2.0org.apache.poipoi-excelant5.2.0com.itextpdfitextpdf5.5.13.2com.itextpdfitext-asian5.2.0

添加spire依赖(商用,有免费版,但是存在页数和字数限制,不采用spire方式可不添加)

spire在添加pom之前还得先添加maven仓库来源
com.e-icebluee-icebluehttps://repo.e-iceblue.cn/repository/maven-public/
接着在项目的pom文件里添加如下依赖
免费版:
e-icebluespire.office.free5.3.1
付费版版:
e-icebluespire.office5.3.1

2、后面用到的工具类代码:

package com.fhey.service.common.utils.file;import cn.hutool.core.util.StrUtil;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.io.File;import java.io.FileInputStream;import java.io.IOException;/** * @author fhey * @date 2023-04-20 11:15:58 * @description: 文件工具类 */public class FileUtil {private static final Logger logger = LoggerFactory.getLogger(FileUtil.class);//获取新文件的全路径public static String getNewFileFullPath(String sourceFilePath, String destFilePath, String ext) {File destFile = new File(destFilePath);if (destFile.isFile()) {return destFilePath;}File sourceFile = new File(sourceFilePath);String sourceFileName = sourceFile.getName();if (sourceFile.isFile()) {return destFilePathFile.separatorsourceFileName.substring(0, sourceFileName.lastIndexOf(StrUtil.DOT))StrUtil.DOText;}return destFilePathFile.separatorsourceFileNameStrUtil.DOText;}//判断文件是否是图片public static boolean isImage(File file) throws IOException {FileInputStream is = new FileInputStream(file);byte[] bytes = new byte[8];is.read(bytes);is.close();String type = bytesToHexString(bytes).toUpperCase();if (type.contains("FFD8FF") //JPEG(jpg)|| type.contains("89504E47") //PNG|| type.contains("47494638") //GIF|| type.contains("49492A00") //TIFF(tif)|| type.contains("424D") //Bitmap(bmp)) {return true;}return false;}//将文件头转换成16进制字符串public static String bytesToHexString(byte[] src) {StringBuilder builder = new StringBuilder();if (src =https://www.itzhengshu.com/pdf/= null || src.length <= 0) {return null;}for (int i = 0; i < src.length; i) {int v = src[i] & 0xFF;String hv = Integer.toHexString(v);if (hv.length() < 2) {builder.append(0);}builder.append(hv);}return builder.toString();}}

一、word文件转pdf文件(支持doc、docx)

1、使用aspose方式

验证代码:
word转pdf的方法比较简单 , aspose-words基本都被帮我们搞定了,doc、docx都能支持 。
代码:
【三 文档在线预览新版通过将文件统一转成pdf来实现在线预览】public static void wordToPdf(String wordPath, String pdfPath) throws Exception {pdfPath = FileUtil.getNewFileFullPath(wordPath, pdfPath, "pdf");File file = new File(pdfPath);FileOutputStream os = new FileOutputStream(file);Document doc = new Document(wordPath);doc.save(os, com.aspose.words.SaveFormat.PDF);}
验证代码:
public static void main(String[] args) throws Exception {wordToPdf("D:\书籍\电子书\其它\《山海经》异兽图.docx", "D:\test");}
转换效果如下,格式、图文都没什么问题,doc、docx经过验证也都能转换成功
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

2、使用poi方式

代码:
public void wordToPdf(String wordPath, String pdfPath) throws Exception {pdfPath = FileUtil.getNewFileFullPath(wordPath, pdfPath, "pdf");try(FileInputStream fileInputStream = new FileInputStream(wordPath);FileOutputStream fileOutputStream = new FileOutputStream(pdfPath)){String ext = wordPath.substring(wordPath.lastIndexOf("."));XWPFDocument document = null;if (".docx".equals(ext)) {document = new XWPFDocument(fileInputStream);} else if (".doc".equals(ext)) {HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream);document = hwPFDocumentToXWPFDocument(hwpfDocument);//有问题} else {throw new Exception("文件格式不正确");}document.write(new FileOutputStream("D:\test\test.docx"));PdfOptions pdfOptions = PdfOptions.create();PdfConverter.getInstance().convert(document, fileOutputStream, pdfOptions);document.close();}}public XWPFDocument hwPFDocumentToXWPFDocument(HWPFDocument hwpfDocument) throws Exception {XWPFDocument xwpfDocument = new XWPFDocument();xwpfDocument.createStyles();Range range = hwpfDocument.getRange();for (int i = 0; i < range.numParagraphs(); i) {Paragraph paragraph = range.getParagraph(i);XWPFParagraph xwpfParagraph = xwpfDocument.createParagraph();if (paragraph.isInTable()) {Table table = range.getTable(paragraph);if (table != null && table.numRows() > 0) {int rows = table.numRows();int cols = table.getRow(0).numCells();XWPFTable xwpfTable = xwpfDocument.createTable(rows, cols);for (int r = 0; r < rows; r) {TableRow tableRow = table.getRow(r);if (tableRow != null && tableRow.numCells() > 0) {for (int c = 0; c < cols; c) {TableCell tableCell = tableRow.getCell(c);if (tableCell != null) {XWPFTableCell xwpfTableCell = xwpfTable.getRow(r).getCell(c);xwpfTableCell.setText(tableCell.text());}}}}}} else {List
allPictures = hwpfDocument.getPicturesTable().getAllPictures();int d = 0;for (int j = 0; j < paragraph.numCharacterRuns(); j) {CharacterRun run = paragraph.getCharacterRun(j);Picture picture = hwpfDocument.getPicturesTable().extractPicture(run, false);if (picture != null) {byte[] pictureBytes = picture.getContent();String pictureType = picture.getMimeType();String fileName = picture.suggestFullFileName();int pictureType1 = getPictureType(pictureType);if (pictureType1 == 0) {continue;}if (d > 0) {continue;}InputStream inputStream = new ByteArrayInputStream(pictureBytes);XWPFParagraph pictureParagraph = xwpfDocument.createParagraph();XWPFRun pictureRun = pictureParagraph.createRun();pictureRun.addPicture(inputStream, pictureType1, fileName, Units.toEMU(picture.getWidth()), Units.toEMU(picture.getHeight()));// 重新设置字体和格式设置int size = xwpfParagraph.getRuns().size();if (size == 0) {continue;}XWPFRun previousRun = xwpfParagraph.getRuns().get(size - 1);pictureRun.setFontFamily(previousRun.getFontFamily());pictureRun.setFontSize(previousRun.getFontSize());pictureRun.setBold(previousRun.isBold());pictureRun.setItalic(previousRun.isItalic());// 可根据需要设置其他格式设置xwpfParagraph.addRun(pictureRun);d;} else {XWPFRun xwpfRun = xwpfParagraph.createRun();xwpfRun.setText(run.text());}}}}hwpfDocument.close();return xwpfDocument;}public static int getPictureType(String mimeType) {if (mimeType.equals("image/jpeg")) {return Document.PICTURE_TYPE_JPEG;} else if (mimeType.equals("image/png")) {return Document.PICTURE_TYPE_PNG;} else if (mimeType.equals("image/gif")) {return Document.PICTURE_TYPE_GIF;} else if (mimeType.equals("image/bmp")) {return Document.PICTURE_TYPE_BMP;} else {return 0;//throw new RuntimeException("Unsupported picture: "mimeType". Expected emf|wmf|pict|jpeg|png|dib|gif|tiff|eps|bmp|wpg");}}
验证代码:
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

3、使用spire方式

代码:
public void wordToPdf(String wordPath, String pdfPath) throws Exception {pdfPath = FileUtil.getNewFileFullPath(wordPath, pdfPath, "pdf");try(FileInputStream fileInputStream = new FileInputStream(wordPath);FileOutputStream fileOutputStream = new FileOutputStream(pdfPath)){String ext = wordPath.substring(wordPath.lastIndexOf("."));XWPFDocument document = null;if (".docx".equals(ext)) {document = new XWPFDocument(fileInputStream);} else if (".doc".equals(ext)) {HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream);document = hwPFDocumentToXWPFDocument(hwpfDocument);} else {throw new Exception("文件格式不正确");}document.write(new FileOutputStream("D:\test\test.docx"));PdfOptions pdfOptions = PdfOptions.create();PdfConverter.getInstance().convert(document, fileOutputStream, pdfOptions);document.close();}}public XWPFDocument hwPFDocumentToXWPFDocument(HWPFDocument hwpfDocument) throws Exception {XWPFDocument xwpfDocument = new XWPFDocument();xwpfDocument.createStyles();Range range = hwpfDocument.getRange();for (int i = 0; i < range.numParagraphs(); i) {Paragraph paragraph = range.getParagraph(i);XWPFParagraph xwpfParagraph = xwpfDocument.createParagraph();if (paragraph.isInTable()) {Table table = range.getTable(paragraph);if (table != null && table.numRows() > 0) {int rows = table.numRows();int cols = table.getRow(0).numCells();XWPFTable xwpfTable = xwpfDocument.createTable(rows, cols);for (int r = 0; r < rows; r) {TableRow tableRow = table.getRow(r);if (tableRow != null && tableRow.numCells() > 0) {for (int c = 0; c < cols; c) {TableCell tableCell = tableRow.getCell(c);if (tableCell != null) {XWPFTableCell xwpfTableCell = xwpfTable.getRow(r).getCell(c);xwpfTableCell.setText(tableCell.text());}}}}}} else {List
allPictures = hwpfDocument.getPicturesTable().getAllPictures();int d = 0;for (int j = 0; j < paragraph.numCharacterRuns(); j) {CharacterRun run = paragraph.getCharacterRun(j);Picture picture = hwpfDocument.getPicturesTable().extractPicture(run, false);if (picture != null) {byte[] pictureBytes = picture.getContent();String pictureType = picture.getMimeType();String fileName = picture.suggestFullFileName();int pictureType1 = getPictureType(pictureType);if (pictureType1 == 0) {continue;}if (d > 0) {continue;}InputStream inputStream = new ByteArrayInputStream(pictureBytes);XWPFParagraph pictureParagraph = xwpfDocument.createParagraph();XWPFRun pictureRun = pictureParagraph.createRun();pictureRun.addPicture(inputStream, pictureType1, fileName, Units.toEMU(picture.getWidth()), Units.toEMU(picture.getHeight()));// 重新设置字体和格式设置int size = xwpfParagraph.getRuns().size();if (size == 0) {continue;}XWPFRun previousRun = xwpfParagraph.getRuns().get(size - 1);pictureRun.setFontFamily(previousRun.getFontFamily());pictureRun.setFontSize(previousRun.getFontSize());pictureRun.setBold(previousRun.isBold());pictureRun.setItalic(previousRun.isItalic());// 可根据需要设置其他格式设置xwpfParagraph.addRun(pictureRun);d;} else {XWPFRun xwpfRun = xwpfParagraph.createRun();xwpfRun.setText(run.text());}}}}hwpfDocument.close();return xwpfDocument;}public static int getPictureType(String mimeType) {if (mimeType.equals("image/jpeg")) {return Document.PICTURE_TYPE_JPEG;} else if (mimeType.equals("image/png")) {return Document.PICTURE_TYPE_PNG;} else if (mimeType.equals("image/gif")) {return Document.PICTURE_TYPE_GIF;} else if (mimeType.equals("image/bmp")) {return Document.PICTURE_TYPE_BMP;} else {return 0;//throw new RuntimeException("Unsupported picture: "mimeType". Expected emf|wmf|pict|jpeg|png|dib|gif|tiff|eps|bmp|wpg");}}
验证代码:
因为使用的是免费版 , 所以只能生成前三页 。。。有超过三页需求的可以选择付费版本 。
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

二、txt文件转pdf文件

txt文件转pdf文件代码直接复用word的即可
代码:
public static void txtToPdf(String txtPath, String pdfPath) throws Exception {wordToPdf(txtPath, pdfPath);}
验证代码:
public static void main(String[] args) throws Exception {txtToPdf("D:\书籍\电子书\国外名著\君主论.txt", "D:\test");}
转换效果如下
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

三、PPT文件转pdf文件(支持ppt、pptx)

PPT文件转pdf文件,听说你们公司不让用ppt,那就让我们把ppt转成pdf再用吧 。其实从这里开始代码就开始复杂起来了,这里用到了Apache poi、itextpdf、Graphics2D三个库,于是我结合这三个库同时兼容ppt、pptx写出了第一版代码

ppt转pdf第一版代码

public static void pptToPdf(String pptPath, String pdfPath) throws IOException {pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");com.itextpdf.text.Document document = null;FileOutputStream fileOutputStream = null;PdfWriter pdfWriter = null;try {InputStream inputStream = Files.newInputStream(Paths.get(pptPath));SlideShow slideShow;String ext = pptPath.substring(pptPath.lastIndexOf("."));if (ext.equals(".pptx")) {slideShow = new XMLSlideShow(inputStream);} else {slideShow = new HSLFSlideShow(inputStream);}Dimension dimension = slideShow.getPageSize();fileOutputStream = new FileOutputStream(pdfPath);//document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));document = new com.itextpdf.text.Document();pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);image.scaleToFit((float) dimension.getWidth(), (float) dimension.getHeight());document.add(image);document.newPage();}} catch (Exception e) {e.printStackTrace();} finally {try {if (document != null) {document.close();}if (fileOutputStream != null) {fileOutputStream.close();}if (pdfWriter != null) {pdfWriter.close();}} catch (IOException e) {e.printStackTrace();}}}private static void setPPTFont(Slide slide, String fontFamily) {// 设置字体, 解决中文乱码for (Shape shape : slide.getShapes()) {if (shape instanceof TextShape) {TextShape textShape = (TextShape) shape;List textParagraphs = textShape.getTextParagraphs();for (TextParagraph textParagraph : textParagraphs) {List textRuns = textParagraph.getTextRuns();for (TextRun textRun : textRuns) {textRun.setFontFamily(fontFamily);}}}}}
验证代码:
public static void main(String[] args) throws Exception {pptToPdf("C:\Users\jie\Desktop\预览\web\files\河西走廊见闻录.pptx", "D:\test");}
转换效果如下
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

可以看到转换效果并不怎么好,ppt的内容展示不全 。于是我开始在网上找解决方案 , 结果找到了一个很神奇的解决方案,就绘制的图片先写在一个PdfPTable对象上 , 再把PdfPTable对象放到document离去 , 于是我根据这个改了改代码写出了第二版代码

ppt转pdf第二版代码

public static void pptToPdf(String pptPath, String pdfPath) throws IOException {pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");com.itextpdf.text.Document document = null;FileOutputStream fileOutputStream = null;PdfWriter pdfWriter = null;try {InputStream inputStream = Files.newInputStream(Paths.get(pptPath));SlideShow slideShow;String ext = pptPath.substring(pptPath.lastIndexOf("."));if (ext.equals(".pptx")) {slideShow = new XMLSlideShow(inputStream);} else {slideShow = new HSLFSlideShow(inputStream);}Dimension dimension = slideShow.getPageSize();fileOutputStream = new FileOutputStream(pdfPath);//document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));document = new com.itextpdf.text.Document();pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();PdfPTable pdfPTable = new PdfPTable(1);for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);image.scaleToFit((float) dimension.getWidth(), (float) dimension.getHeight());// 写入单元格pdfPTable.addCell(new PdfPCell(image, true));document.add(pdfPTable);pdfPTable.deleteBodyRows();document.newPage();}} catch (Exception e) {e.printStackTrace();} finally {try {if (document != null) {document.close();}if (fileOutputStream != null) {fileOutputStream.close();}if (pdfWriter != null) {pdfWriter.close();}} catch (IOException e) {e.printStackTrace();}}}
转换效果如下
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

可以看到ppt内容已经展示完整了,到此其实ppt转pdf功能已经基本实现了 , 但是显示效果依然不算完美毕竟我们其实想要的是在pdf里和在ppt看的是一样的效果,而且每页ppt的长宽其实都是一样的,所以我就在想能不能设置pdf每页的长宽,把pdf每页的长宽设置成和ppt的长宽一样 。于是我开始看初始化pdf document的源码配置
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
然后发现com.itextpdf.text.Document除了默认的构造函数外还有这这样一个构造函数:
public Document(Rectangle pageSize) {this(pageSize, 36.0F, 36.0F, 36.0F, 36.0F);}
然后com.itextpdf.text.Rectangle类点进去就发现了可以设置长宽的构造函数:
public Rectangle(float urx, float ury) {this(0.0F, 0.0F, urx, ury);}
于是我代码中的初始化Document进行如下调整(根据第一版代码改,第二版的PdfPTable可以不用了)
document = new com.itextpdf.text.Document();//改成如下document = new com.itextpdf.text.Document(new com.itextpdf.text.Rectangle((float) dimension.getWidth(), (float) dimension.getHeight()));

ppt转pdf第三版代码(最终版)

public void pptToPdf(String pptPath, String pdfPath) throws IOException, DocumentException {List images = pptToBufferedImages(pptPath);if(CollectionUtils.isEmpty(images)){return;}pdfPath = FileUtil.getNewFileFullPath(pptPath, pdfPath, "pdf");try (FileOutputStream fileOutputStream = new FileOutputStream(pdfPath)){BufferedImage firstImage = images.get(0);com.itextpdf.text.Rectangle rectangle = new com.itextpdf.text.Rectangle((float) firstImage.getWidth(), (float) firstImage.getHeight());com.itextpdf.text.Document document = new com.itextpdf.text.Document(rectangle, 0, 0, 0, 0);PdfWriter pdfWriter = PdfWriter.getInstance(document, fileOutputStream);document.open();for (BufferedImage bufferedImage : images) {com.itextpdf.text.Image image = com.itextpdf.text.Image.getInstance(bufferedImage, null);//image.scaleToFit((float) image.getWidth(), (float) image.getHeight());document.add(image);document.newPage();}document.close();pdfWriter.close();}} private static List pptToBufferedImages(String pptPath) {List images = new ArrayList<>();try (SlideShow slideShow = SlideShowFactory.create(new File(pptPath));) {Dimension dimension = slideShow.getPageSize();for (Slide slide : slideShow.getSlides()) {// 设置字体, 解决中文乱码setPPTFont(slide, "宋体");BufferedImage bufferedImage = new BufferedImage((int) dimension.getWidth(), (int) dimension.getHeight(), BufferedImage.TYPE_INT_RGB);Graphics2D graphics2d = bufferedImage.createGraphics();graphics2d.setPaint(Color.white);graphics2d.setFont(new java.awt.Font("宋体", java.awt.Font.PLAIN, 12));slide.draw(graphics2d);graphics2d.dispose();images.add(bufferedImage);}return images;} catch (Exception e) {e.printStackTrace();}return null;}//设置ppt字体private static void setPPTFont(Slide slide, String fontFamily) {// 设置字体, 解决中文乱码for (Shape shape : slide.getShapes()) {if (shape instanceof TextShape) {TextShape textShape = (TextShape) shape;List textParagraphs = textShape.getTextParagraphs();for (TextParagraph textParagraph : textParagraphs) {List textRuns = textParagraph.getTextRuns();for (TextRun textRun : textRuns) {textRun.setFontFamily(fontFamily);}}}}}
转换效果如下
三 文档在线预览新版通过将文件统一转成pdf来实现在线预览

现在展示的效果已经和ppt上一样了,而且经过验证ppt和pptx都是可以转换成功的 。

四、图片转pdf文件

图片转pdf用到了用到了Apache poi、itextpdf两个库 , 因为itextpdf支持解析的图片有限,点开c读取图片的方法com.itextpdf.text.Image.

相关经验推荐