欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 科技 > 能源 > python将pdf转换成word

python将pdf转换成word

2025/3/9 4:27:55 来源:https://blog.csdn.net/cf8833/article/details/146069898  浏览:    关键词:python将pdf转换成word

说明:
我计划用python,把pdf文件转换成word文件
step1:把python环境安装好,然后把helloworld跑起来

step2:安装依赖: 首先需要安装必要的Python库,在终端中运行,会开始下载依赖包,等待下载完成

C:\Users\Administrator>pip --version
pip 25.0.1 from C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip (python 3.12)C:\Users\Administrator>pip install pdf2docx python-docxCollecting pdf2docxUsing cached pdf2docx-0.5.8-py3-none-any.whl.metadata (3.2 kB)Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl.metadata (103 kB)
Collecting numpy>=1.17.2 (from pdf2docx)Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl.metadata (60 kB)
Collecting opencv-python-headless>=4.5 (from pdf2docx)Using cached opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting fire>=0.3.0 (from pdf2docx)Using cached fire-0.7.0.tar.gz (87 kB)Preparing metadata (setup.py) ... done
Collecting lxml>=3.1.0 (from python-docx)Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl.metadata (3.8 kB)
Collecting typing-extensions>=4.9.0 (from python-docx)Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting termcolor (from fire>=0.3.0->pdf2docx)Using cached termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Using cached pdf2docx-0.5.8-py3-none-any.whl (132 kB)
Using cached python_docx-1.1.2-py3-none-any.whl (244 kB)
Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl (2.2 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 104.8 kB/s eta 0:00:00
Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl (3.8 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 91.1 kB/s eta 0:00:00
Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl (12.9 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.9/12.9 MB 95.3 kB/s eta 0:00:00
Downloading opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl (39.4 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.4/39.4 MB 94.9 kB/s eta 0:00:00
Downloading pymupdf-1.25.3-cp39-abi3-win_amd64.whl (16.5 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 91.1 kB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading termcolor-2.5.0-py3-none-any.whl (7.8 kB)
Building wheels for collected packages: fireBuilding wheel for fire (setup.py) ... doneCreated wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114261 sha256=d635b5056afde4d2d8f1b04a79ad6c18207aef0d16c499c683ef7eac73194231Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\46\54\24\1624fd5b8674eb1188623f7e8e17cdf7c0f6c24b609dfb8a89
Successfully built fire
Installing collected packages: typing-extensions, termcolor, PyMuPDF, numpy, lxml, fonttools, python-docx, opencv-python-headless, fire, pdf2docx
Successfully installed PyMuPDF-1.25.3 fire-0.7.0 fonttools-4.56.0 lxml-5.3.1 numpy-2.2.3 opencv-python-headless-4.11.0.86 pdf2docx-0.5.8 python-docx-1.1.2 termcolor-2.5.0 typing-extensions-4.12.2

step3:C:\Users\Administrator\PycharmProjects\PythonProject2.venv\Scripts\activate_this.py

# pdf_to_word.py
import os
from pdf2docx import Converter
from datetime import datetimedef convert_pdf_to_word(input_path, output_folder):"""将PDF转换为Word文档参数:input_path (str): 输入的PDF文件路径output_folder (str): 输出文件夹路径返回:str: 生成的Word文件路径"""try:# 生成输出文件名(带时间戳)filename = os.path.basename(input_path)output_name = f"{os.path.splitext(filename)[0]}_converted_{datetime.now().strftime('%Y%m%d%H%M%S')}.docx"output_path = os.path.join(output_folder, output_name)# 初始化转换器cv = Converter(input_path)# 设置进度回调函数def progress_callback(percentage):print(f"\r转换进度: {percentage:.0f}%", end='', flush=True)# 执行转换(多线程优化)cv.convert(output_path,start=0,end=None,progress_callback=progress_callback)cv.close()print(f"\n转换成功!文件已保存至:{output_path}")return output_pathexcept Exception as e:print(f"\n转换失败:{str(e)}")return Noneif __name__ == "__main__":# 输入输出路径配置desktop = os.path.join(os.path.expanduser("~"), "Desktop")input_pdf = r"C:\Users\Administrator\Desktop\qe.pdf"output_dir = desktop# 检查输入文件if not os.path.exists(input_pdf):print(f"错误:输入的PDF文件不存在 - {input_pdf}")exit(1)# 创建输出目录(如果不存在)os.makedirs(output_dir, exist_ok=True)# 执行转换print("开始PDF转换...")result = convert_pdf_to_word(input_pdf, output_dir)if result:print("操作完成!")else:print("转换过程中出现问题。")

step4:点击run运行

C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\python.exe C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\activate_this.py 
[INFO] Start to convert C:\Users\Administrator\Desktop\files.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
开始PDF转换...
[INFO] [3/4] Parsing pages...
[INFO] (1/1) Page 1
[INFO] [4/4] Creating pages...
[INFO] (1/1) Page 1
[INFO] Terminated in 0.10s.转换成功!文件已保存至:C:\Users\Administrator\Desktop\files_converted_20250306142813.docx
操作完成!Process finished with exit code 0

end

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词