【需求】 最近项目中有一个需求,需要实现 java 读取上传的 Word 文件的内容 【实现方法】 现有一文档内容如下: 实现代码如下: 引入依赖: org.apache.poi p
最近项目中有一个需求,需要实现 java 读取上传的 Word 文件的内容
现有一文档内容如下:
实现代码如下:
引入依赖:
<dependency> <groupId>org.apache.poigroupId> <artifactId>poi-ooxmlartifactId> <version>4.1.2version>dependency><dependency> <groupId>org.apache.poigroupId> <artifactId>poi-scratchpadartifactId> <version>4.1.2version>dependency>
编写工具类如下:
public class WordUtil { public static String readDocContent(String wordPath) throws Exception { String content = ""; if (wordPath.endsWith(".doc")) { FileInputStream fileInputStream = new FileInputStream(new File(wordPath)); // 获取单词提取器 WordExtractor wordExtractor = new WordExtractor(fileInputStream); content = wordExtractor.getText(); wordExtractor.close(); } else if (wordPath.endsWith(".docx")) { OPCPackage opcPackage = POIXMLDocument.openPackage(wordPath); // 获得文本提取器 POIXMLTextExtractor textExtractor = new XWPFWordExtractor(opcPackage); content = textExtractor.getText(); textExtractor.close(); } else { throw new SysException("此文件不是 word 文件"); } return content; } public static String readDocContent(InputStream inputStream, String fileName) throws IOException { String content = ""; if (fileName.endsWith(".doc")) { // 获取单词提取器 WordExtractor wordExtractor = new WordExtractor(inputStream); content = wordExtractor.getText(); wordExtractor.close(); } else if (fileName.endsWith(".docx")) { XWPFDocument xwpfDocument = new XWPFDocument(inputStream); // 获得文本提取器 POIXMLTextExtractor textExtractor = new XWPFWordExtractor(xwpfDocument); content = textExtractor.getText(); textExtractor.close(); } else { throw new SysException("此文件不是 word 文件"); } return content; }}
编写测试类进行测试:
@Testpublic void testReadDoc() { String wordPath = "C:\\Users\\Administrator\\Desktop\\ktest.docx"; // 根据文件路径获取内容 try { String content = WordUtil.readDocContent(wordPath); System.err.println(content); } catch (Exception e) { throw new RuntimeException(e); }// 根据输入流获取内容 try { String content2 = WordUtil.readDocContent(new FileInputStream(wordPath), "ktest.docx"); System.err.println(content2); } catch (IOException e) { throw new RuntimeException(e); }}
运行输出结果如下:
来源地址:https://blog.csdn.net/weixin_44117737/article/details/131451747
--结束END--
本文标题: java 实现读取 word 文件文字内容信息
本文链接: https://www.lsjlt.com/news/417099.html(转载时请注明来源链接)
有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341
下载Word文档到电脑,方便收藏和打印~
2024-04-03
2024-04-03
2024-04-01
2024-01-21
2024-01-21
2024-01-21
2024-01-21
2023-12-23
回答
回答
回答
回答
回答
回答
回答
回答
回答
回答
0