一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

如果不想网页上的文章被复制(没错,说的就是某点),如果想实现文档不需要下载下来就能在线预览查看(常见于文档付费下载网站、邮箱附件预览),该怎么做?常见的做法就是将他们转化成图片 。以下代码基于 aspose-words(用于txt、word转图片),pdfbox(用于pdf转图片),封装成一个工具类来实现txt、word、pdf等文件转图片的需求 。
首先在项目的pom文件里添加下面两个依赖
com.luhuiguoaspose-words23.1org.apache.pdfboxpdfbox2.0.4

一、将文件转换成图片,并生成到本地

1、将word文件转成图片

public static void wordToImage(String wordPath, String imagePath) throws Exception {Document doc = new Document(wordPath);File file = new File(wordPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));for (int i = 0; i < doc.getPageCount(); i) {Document extractedPage = doc.extractPages(i, 1);String path = pathPre(i1)".png";extractedPage.save(path, SaveFormat.PNG);}}
验证:
public static void main(String[] args) throws Exception {FileConvertUtil.wordToImage("D:\书籍\电子书\其它\《山海经》异兽图.doc", "D:\test\word");}
验证结果:
一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

2、将txt文件转成图片(同word文件转成图片)

public static void txtToImage(String txtPath, String imagePath) throws Exception {wordToImage(txtPath, imagePath);}
验证:
public static void main(String[] args) throws Exception {FileConvertUtil.wordToImage("D:\书籍\电子书\其它\《山海经》异兽图.doc", "D:\test\word");}
验证结果:
一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

3、将pdf文件转图片

public static void pdfToImage(String pdfPath, String imagePath) throws Exception {File file = new File(pdfPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));PDDocument doc = PDDocument.load(file);PDFRenderer renderer = new PDFRenderer(doc);for (int i = 0; i < doc.getNumberOfPages(); i) {BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPIString pathname = pathPre(i1)".png";ImageIO.write(image, "PNG", new File(pathname));}doc.close();}
验证:
【一 文档在线预览将txt、word、pdf转成图片实现在线预览功能】public static void main(String[] args) throws Exception {FileConvertUtil.pdfToImage("D:\书籍\电子书\其它\自然哲学的数学原理.pdf", "D:\test\pdf");}
验证结果:
一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

4、同时支持多种文件类型转成图片
public static void fileToImage(String sourceFilePath, String imagePath) throws Exception {String ext = sourceFilePath.substring(sourceFilePath.lastIndexOf("."));switch (ext) {case ".doc":case ".docx":wordToImage(sourceFilePath, imagePath);break;case ".pdf":pdfToImage(sourceFilePath, imagePath);break;case ".txt":txtToImage(sourceFilePath, imagePath);break;default:System.out.println("文件格式不支持");}}

二、利用多线程提升文件写入本地的效率

在将牛顿大大的长达669页的巨作《自然哲学的数学原理》时发现执行时间较长,执行花了140,281ms 。但其实这种IO密集型的操作是通过使用多线程的方式来提升效率的,于是针对这点,我又写了一版多线程的版本 。
同步执行导出 自然哲学的数学原理.pdf 耗时:
一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

优化后的代码如下:
public static void pdfToImageAsync(String pdfPath, String imagePath) throws Exception {long old = System.currentTimeMillis();File file = new File(pdfPath);PDDocument doc = PDDocument.load(file);PDFRenderer renderer = new PDFRenderer(doc);int pageCount = doc.getNumberOfPages();int numCores = Runtime.getRuntime().availableProcessors();ExecutorService executorService = Executors.newFixedThreadPool(numCores);for (int i = 0; i < pageCount; i) {int finalI = i;executorService.submit(() -> {try {BufferedImage image = renderer.renderImageWithDPI(finalI, 144); // Windows native DPIString filename = file.getName();filename = filename.substring(0, filename.lastIndexOf("."));String pathname = imagePathFile.separatorfilename(finalI1)".png";ImageIO.write(image, "PNG", new File(pathname));} catch (Exception ex) {ex.printStackTrace();}});}executorService.shutdown();executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);doc.close();long now = System.currentTimeMillis();System.out.println("pdfToImage 多线程 转换完成..用时:"(now - old)"ms");}
多线程执行导出 自然哲学的数学原理.pdf 耗时如下:
一 文档在线预览将txt、word、pdf转成图片实现在线预览功能

从上图可以看到本次执行只花了24045ms,只花了原先差不多六分之一的时间 , 极大地提升了执行效率 。除了pdf,word、txt转图片也可以做这样的多线程改造:
//将word转成图片(多线程)public static void wordToImageAsync(String wordPath, String imagePath) throws Exception {Document doc = new Document(wordPath);File file = new File(wordPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));int numCores = Runtime.getRuntime().availableProcessors();ExecutorService executorService = Executors.newFixedThreadPool(numCores);for (int i = 0; i < doc.getPageCount(); i) {int finalI = i;executorService.submit(() -> {try {Document extractedPage = doc.extractPages(finalI, 1);String path = pathPre(finalI1)".png";extractedPage.save(path, SaveFormat.PNG);} catch (Exception ex) {ex.printStackTrace();}});}}//将txt转成图片(多线程)public static void txtToImageAsync(String txtPath, String imagePath) throws Exception {wordToImageAsync(txtPath, imagePath);}

三、将文件转换成图片流

? 有的时候我们转成图片后并不需要在本地生成图片,而是需要将图片返回或者上传到图片服务器,这时候就需要将转换后的图片转成流返回以方便进行传输,代码示例如下:

1、将word文件转成图片流

public static List wordToImageStream(String wordPath) throws Exception {Document doc = new Document(wordPath);List list = new ArrayList<>();for (int i = 0; i < doc.getPageCount(); i) {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()){Document extractedPage = doc.extractPages(i, 1);extractedPage.save(outputStream, SaveFormat.*PNG*);list.add(outputStream.toByteArray());}}return list;}

2、将txt文件转成图片流

public static List txtToImageStream(String txtPath) throws Exception {return *wordToImagetream*(txtPath);}

3、将pdf转成图片流

public static List pdfToImageStream(String pdfPath) throws Exception {File file = new File(pdfPath);PDDocument doc = PDDocument.*load*(file);PDFRenderer renderer = new PDFRenderer(doc);List list = new ArrayList<>();for (int i = 0; i < doc.getNumberOfPages(); i) {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPIImageIO.*write*(image, "PNG", outputStream);list.add(outputStream.toByteArray());}}doc.close();return list;}

4、支持多种类型文件转成图片流

public static List fileToImageStream(String pdfPath) throws Exception {String ext = pdfPath.substring(pdfPath.lastIndexOf("."));switch (ext) {case ".doc":case ".docx":return *wordToImageStream*(pdfPath);case ".pdf":return *pdfToImageStream*(pdfPath);case ".txt":return *txtToImageStream*(pdfPath);default:System.*out*.println("文件格式不支持");}return null;}

最后附上完整的工具类代码:

package com.fhey.service.common.utils.file;import com.aspose.words.Document;import com.aspose.words.SaveFormat;import com.aspose.words.SaveOptions;import javassist.bytecode.ByteArray;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.rendering.PDFRenderer;import javax.imageio.ImageIO;import java.awt.image.BufferedImage;import java.io.ByteArrayOutputStream;import java.io.File;import java.util.ArrayList;import java.util.List;import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;import java.util.concurrent.TimeUnit;public class FileConvertUtil {//文件转成图片public static void fileToImage(String sourceFilePath, String imagePath) throws Exception {String ext = sourceFilePath.substring(sourceFilePath.lastIndexOf("."));switch (ext) {case ".doc":case ".docx":wordToImage(sourceFilePath, imagePath);break;case ".pdf":pdfToImage(sourceFilePath, imagePath);break;case ".txt":txtToImage(sourceFilePath, imagePath);break;default:System.out.println("文件格式不支持");}}//将pdf转成图片public static void pdfToImage(String pdfPath, String imagePath) throws Exception {File file = new File(pdfPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));PDDocument doc = PDDocument.load(file);PDFRenderer renderer = new PDFRenderer(doc);for (int i = 0; i < doc.getNumberOfPages(); i) {BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPIString pathname = pathPre(i1)".png";ImageIO.write(image, "PNG", new File(pathname));}doc.close();}//txt转成转成图片public static void txtToImage(String txtPath, String imagePath) throws Exception {wordToImage(txtPath, imagePath);}//将word转成图片public static void wordToImage(String wordPath, String imagePath) throws Exception {Document doc = new Document(wordPath);File file = new File(wordPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));for (int i = 0; i < doc.getPageCount(); i) {Document extractedPage = doc.extractPages(i, 1);String path = pathPre(i1)".png";extractedPage.save(path, SaveFormat.PNG);}}//pdf转成图片(多线程)public static void pdfToImageAsync(String pdfPath, String imagePath) throws Exception {long old = System.currentTimeMillis();File file = new File(pdfPath);PDDocument doc = PDDocument.load(file);PDFRenderer renderer = new PDFRenderer(doc);int pageCount = doc.getNumberOfPages();int numCores = Runtime.getRuntime().availableProcessors();ExecutorService executorService = Executors.newFixedThreadPool(numCores);for (int i = 0; i < pageCount; i) {int finalI = i;executorService.submit(() -> {try {BufferedImage image = renderer.renderImageWithDPI(finalI, 144); // Windows native DPIString filename = file.getName();filename = filename.substring(0, filename.lastIndexOf("."));String pathname = imagePathFile.separatorfilename(finalI1)".png";ImageIO.write(image, "PNG", new File(pathname));} catch (Exception ex) {ex.printStackTrace();}});}executorService.shutdown();executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);doc.close();long now = System.currentTimeMillis();System.out.println("pdfToImage 多线程 转换完成..用时:"(now - old)"ms");}//将word转成图片(多线程)public static void wordToImageAsync(String wordPath, String imagePath) throws Exception {Document doc = new Document(wordPath);File file = new File(wordPath);String filename = file.getName();String pathPre = imagePathFile.separatorfilename.substring(0, filename.lastIndexOf("."));int numCores = Runtime.getRuntime().availableProcessors();ExecutorService executorService = Executors.newFixedThreadPool(numCores);for (int i = 0; i < doc.getPageCount(); i) {int finalI = i;executorService.submit(() -> {try {Document extractedPage = doc.extractPages(finalI, 1);String path = pathPre(finalI1)".png";extractedPage.save(path, SaveFormat.PNG);} catch (Exception ex) {ex.printStackTrace();}});}}//将txt转成图片(多线程)public static void txtToImageAsync(String txtPath, String imagePath) throws Exception {wordToImageAsync(txtPath, imagePath);}//将文件转成图片流public static List fileToImageStream(String pdfPath) throws Exception {String ext = pdfPath.substring(pdfPath.lastIndexOf("."));switch (ext) {case ".doc":case ".docx":return wordToImageStream(pdfPath);case ".pdf":return pdfToImageStream(pdfPath);case ".txt":return txtToImageStream(pdfPath);default:System.out.println("文件格式不支持");}return null;}//将pdf转成图片流public static List pdfToImageStream(String pdfPath) throws Exception {File file = new File(pdfPath);PDDocument doc = PDDocument.load(file);PDFRenderer renderer = new PDFRenderer(doc);List list = new ArrayList<>();for (int i = 0; i < doc.getNumberOfPages(); i) {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPIImageIO.write(image, "PNG", outputStream);list.add(outputStream.toByteArray());}}doc.close();return list;}//将word转成图片流public static List wordToImageStream(String wordPath) throws Exception {Document doc = new Document(wordPath);List list = new ArrayList<>();for (int i = 0; i < doc.getPageCount(); i) {try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()){Document extractedPage = doc.extractPages(i, 1);extractedPage.save(outputStream, SaveFormat.PNG);list.add(outputStream.toByteArray());}}return list;}//将txt转成图片流public static List txtToImageStream(String txtPath) throws Exception {return wordToImageStream(txtPath);}}

相关经验推荐