python 实现 docx 文本替换

目的

由于商业原因，最近要实现一个功能，就是word文档（docx格式）的批量文本替换，把docx中的商标名换成自己的，替换过程中要保留word文档的样式。
在网上搜寻多种方案，有 python-docx, 有 apache poi, 均不符合我的要求，主要是这些库解析docx文档能力有限，文档不全，而且有的内容无法识别到，
有的即使替换了，样式也乱了，下面是我自己的方案，效果堪称完美。

思路

docx 文件其实是zip文件，在windows上，我们将docx文件重命名为zip文件，解压，即可得到下面的文件：

可以看到 docx文件本质是资源文件 + xml 文件的集合，而 word文档中的文字都在 word/document.xml 中。
我们只要替换这个文件，再重新压缩为zip,再改成.docx格式，即可实现替换。

代码实现

import zipfile
import shutil
import os
import file_util
'''
input_f 输入文件路径
out_f 输出文件路径
'''
def replace_text(input_f, out_f):
owd = os.getcwd()
# 将淘宝替换为京东
replacements = {
'淘宝': '京东',
}
tmp_dir = "D:\\BaiduNetdiskDownload\\tmp\\"
try:
filename = input_f.split("\\")[-1]
os.chdir(tmp_dir)
shutil.copy(input_f, tmp_dir) # 复制到临时目录
copy_file = os.path.join(tmp_dir, filename)
zip_file = copy_file.replace(".docx", ".zip")
os.rename(copy_file, zip_file)
# 解压文件
with zipfile.ZipFile(zip_file, "r") as zip_ref:
zip_ref.extractall(tmp_dir)
os.remove(zip_file)
# 替换文本
with open('word\\document.xml', 'r', encoding="utf8") as file:
str = file.read()
for k, v in replacements.items():
str = str.replace(k, v)
text_file = open("out.xml", "w", encoding="utf8")
n = text_file.write(str)
text_file.close()
# close input and output files
os.remove("word\\document.xml")
os.rename("out.xml", "word\\document.xml")
# 重命名文件
dst_zip_file_name = filename.replace(".docx", ".zip")
shutil.make_archive(filename.replace(".docx", ""), 'zip', tmp_dir)
os.rename(dst_zip_file_name, out_f)
except Exception as e:
pass
finally:
# 将临时目录清空
file_util.clean_dir(tmp_dir)
os.chdir(owd)

python 实现 docx 文本替换

目的

思路

代码实现

php include 找不到变量

热门标签

关注我们么么哒！

python 实现 docx 文本替换

目的

思路

代码实现

微信扫一扫,分享到朋友圈

php include 找不到变量

猜你喜欢

热门标签

关注我们 么么哒！

关注我们的公众号

关注我们么么哒！