首页 > 资讯 > 后端开发 > Python >51. Python 数据处理（2）

401

分享到

51. Python 数据处理（2）

数据处理 Python 2023-01-31 06:01:09 401人浏览薄情痞子

Python 官方文档：入门教程 => 点击学习

摘要

1.python 修改excel文件import xlrd import xlutils.copy excelr = xlrd.open_workbook("hello.xlsx") excelw = xlutils.copy.copy(e

1.python 修改excel文件

import xlrd
import xlutils.copy
excelr = xlrd.open_workbook("hello.xlsx")
excelw = xlutils.copy.copy(excelr)
sheet1 = excelw.get_sheet(0)
sheet1.write(3, 5, "xlutils.copy test test")
excelw.save("hello.xlsx") 	# 这里如果名称不变，则覆盖原文件，如果名称改变，则生成新名称的文件。

#所以，由上代码可以分析出，如果文件原本就存在，而你要修改它，不能直接使用xlwt，必须使用 xlutils.copy 方法复制一份出来再修改，最后保存或覆盖原文件。

#原表格内容

#改后表格内容

2. Python 创建新的excel文件，指定标签页，并写入内容到应标签页内

import xlwt
excel = xlwt.Workbook("hello.xlsx")
sheet1 = excel.add_sheet("sheet5")
sheet2 = excel.add_sheet("sheet2")
sheet3 = excel.add_sheet("sheet3")
sheet1.write(0,0,"hello world")
sheet2.write(1,0,"hello")
sheet3.write(2,0,"test test")
excel.save("hello1.xlsx")

执行结果:

打开hello1.xlsx

3.处理pdf文件

(1) 读取pdf文件

python3 安装 pdfminer3k

# pip install pdfminer3k

from pdfminer.pdfparser import PDFParser,PDFDocument
from pdfminer.pdfparser import PDFPage
from pdfminer.pdfinterp import PDFResourceManager,PDFTextExtractionNotAllowed
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfdevice import PDFDevice
from pdfminer.layout import LAParams
from pdfminer.converter import PDFPageAggregator

#获取文档对象，你把alGorithm.pdf换成你自己的文件名即可。
fp=open("C:\\Users\\Shinelon\\PyCharmProjects\\Python3\\datachuli\\aminglinux\\chapter1.pdf","rb")
#创建一个与文档相关联的解释器
parser=PDFParser(fp)
doc=PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
#PDF文档对象,提供密码初始化，没有就不用带passWord参数。
doc.initialize()
#检查文件是否允许文本提取
if not doc.is_extractable:
    raise PDFTextExtractionNotAllowed
#链接解释器和文档对象
#parser.set_document(doc)
#doc.set_paeser(parser)
#初始化文档
#doc.initialize("")
#创建PDF资源管理器对象来存储共享资源
resource=PDFResourceManager()
#参数分析器
laparam=LAParams()
#创建一个聚合器
device=PDFPageAggregator(resource, laparams=laparam)
#创建PDF页面解释器
interpreter=PDFPageInterpreter(resource,device)
#使用文档对象得到页面集合
for page in doc.get_pages():
    #使用页面解释器来读取
    interpreter.process_page(page)
    #使用聚合器来获取内容
    layout=device.get_result()
    for out in layout:
        if hasattr(out, "get_text"):
            print(out.get_text())

（2）合并多个pdf文件为一个pdf文件

安装pypdf2

# pip install pypdf2

import PyPDF2
import os


# 建立一个装pdf文件的数组
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"""这种方法获取的文件名列表是按照ascii码排序的，例如：chapter1.pdf,chapter10.pdf,chapter11.pdf...以此类推"""
# for fileName in os.listdir(r'C:\Users\Shinelon\PycharmProjects\Python3\datachuli\aminglinux'):  # 遍历该程序所在文件夹内的文件
#     if fileName.endswith('.pdf'):  # 找到以.pdf结尾的文件
#         pdfFiles.append(fileName)  # 将pdf文件装进pdfFiles数组内
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

"""这种方式可以采用，但是应该还有更好的方法，再想想"""
pdfFiles = []
for i in range(1, 27):
    pdfFiles.append("chapter{0}.pdf".fORMat(i))
    
os.chdir(r"C:\Users\Shinelon\PycharmProjects\Python3\datachuli\aminglinux")
pdfWriter = PyPDF2.PdfFileWriter()  # 生成一个空白的pdf文件

for pdf in pdfFiles:
    pdfReader = PyPDF2.PdfFileReader(open(pdf, 'rb'))  # 以只读方式依次打开pdf文件
    for pageNum in range(pdfReader.numPages):
        print(pdfReader.getPage(pageNum))
        pdfWriter.addPage(pdfReader.getPage(pageNum))  # 将打开的pdf文件内容一页一页的复制到新建的空白pdf里

pdfOutput = open('combine.pdf', 'wb')  # 生成combine.pdf文件
pdfWriter.write(pdfOutput)  # 将复制的内容全部写入combine.pdf
pdfOutput.close()

3.Python 处理图片

图像处理是一门应用非常广泛的技术，而拥有非常丰富第三方扩展库的python当然不会错过。

PIL（Python Imaging Library）是python种最常用的图像处理库，如果你是python2.x，可以通过一下地址进行下载：Http://www.pythonware.com/products/pil/index.htm，找到对应的版本进行下载。

【注意】PIL模块在python3.x中已经替换为pillow模块，文档地址：

http://pillow.readthedocs.io/en/latest/

直接使用

pip install pillow

也可以安装模块

导入时使用 from PIL import Image

简单例子：

from PIL import Image
image = Image.open("img.jpg")
print (image.format, image.size, image.mode)
image.show()

结果：

JPEG (580, 326) RGB

并把图片打开，展示出来

由上例子可以知道：

Image的三个属性：

format : 识别图像的源格式，如果该文件不是从文件中读取的，则被置为 None 值。

size : 返回的一个元组，有两个元素，其值为象素意义上的宽和高。

mode : RGB（true color image），此外还有，L（luminance），CMTK（pre-press image）。

Image的方法介绍：

show()：显示最近加载的图像

open(infilename): 打开文件

save(outfilename)：保存文件

crop((left, upper, right, lower))：从图像中提取出某个矩形大小的图像。它接收一个四元素的元组作为参数，各元素为（left, upper, right, lower），坐标系统的原点（0, 0）是左上角。【即抠图】

Image的几何处理：

out = im.resize((128, 128)) #调整图片大小

out = im.rotate(45) #逆时针旋转 45 度角。

out = im.transpose(Image.FLIP_LEFT_RIGHT) #左右对换。

out = im.transpose(Image.FLIP_TOP_BOTTOM) #上下对换。

out = im.transpose(Image.ROTATE_90) #旋转 90 度角。

out = im.transpose(Image.ROTATE_180) #旋转 180 度角。

out = im.transpose(Image.ROTATE_270) #旋转 270 度角。

例一：抠图

图片：

脚本：

from PIL import Image
image = Image.open("img.jpg")
print(image.format, image.size, image.mode)
box = (170, 0, 390, 260)
region = image.crop(box)
region.save("cutting.jpg")

抠取过程：

解释：上述代码将图片的((170, 0), (170, 260), (390, 0), (390, 260))所画出来的区域进行裁剪，并保存在cutting.jpg中

结果L：

史上最强驱逐舰，大家一起来感受一下~_(:3 」∠)_

例子2：图片拼合

将图片抠出来，旋转180度后，在贴回图片上

from PIL import Image
image = Image.open("img.jpg")
print(image.format, image.size, image.mode)
box = (170, 0, 390, 260)
egion = image.crop(box)
egion.save("cutting.jpg")
region = egion.transpose(Image.ROTATE_180)
image.paste(region, box)
image.show()

效果：

例子3：缩放

from PIL import Image
infile = "img.jpg"
outfile = "img2.jpg"
image = Image.open(infile)
(x, y) = image.size
newx = 300      #缩小尺寸
newy = int(y*newx/x)
out = image.resize((newx, newy), Image.ANTIALIAS)
out.show()
out.save(outfile)

对比一下：

缩放图：

例子4：验证码（已封装）

代码如下：

import random
import string
from PIL import Image, ImageDraw, ImageFont, ImageFilter


class VerCode(object):
    def __init__(self):
        # 字体的位置，不同版本的系统会有不同
        self.font_path = 'consolai.ttf'
        # 生成几位数的验证码
        self.number = 4
        # 生成验证码图片的高度和宽度
        self.size = (100, 30)
        # 背景颜色，默认为白色
        self.bGColor = (255, 255, 255)
        # 字体颜色，默认为蓝色
        self.fontcolor = (0, 0, 255)
        # 干扰线颜色。默认为红色
        self.linecolor = (255, 0, 0)
        # 是否要加入干扰线
        self.draw_line = True
        # 加入干扰线条数的上下限
        self.line_number = 20


    # 用来随机生成一个字符串
    def gene_text(self):
        self.source = list(string.ascii_letters)
        for self.index in range(0, 10):
            self.source.append(str(self.index))
        return ''.join(random.sample(self.source, self.number))  # number是生成验证码的位数


    # 用来绘制干扰线
    def gene_line(self, draw, width, height):
        self.begin = (random.randint(0, width), random.randint(0, height))
        self.end = (random.randint(0, width), random.randint(0, height))
        draw.line([self.begin, self.end], fill=self.linecolor)


    # 生成验证码
    def gene_code(self):
        self.width, self.height = self.size  # 宽和高
        self.image = Image.new('RGBA', (self.width, self.height), self.bgcolor)  # 创建图片
        self.font = ImageFont.truetype(self.font_path, 25)  # 验证码的字体
        self.draw = ImageDraw.Draw(self.image)  # 创建画笔
        self.text = self.gene_text()                 # 生成字符串
        self.font_width, self.font_height = self.font.getsize(self.text)
        self.draw.text(((self.width - self.font_width) / self.number, (self.height - self.font_height) / self.number), self.text, font=self.font, fill=self.fontcolor)  # 填充字符串
        if self.draw_line:
            for i in range(self.line_number):
                self.gene_line(self.draw, self.width, self.height)
    def effect(self):
        #self.image = self.image.transform((self.width + 20, self.height + 10), Image.AFFINE, (1, -0.3, 0, -0.1, 1, 0), Image.BILINEAR)  # 创建扭曲
        self.image = self.image.filter(ImageFilter.EDGE_ENHANCE_MORE)  # 滤镜，边界加强
        self.image.save('idencode.png')  # 保存验证码图片
        #self.image.show()


if __name__ == "__main__":
    vco = VerCode()
    vco.gene_code()
    vco.effect()

效果：

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: 51. Python 数据处理（2）

本文链接: https://www.lsjlt.com/news/189460.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

本篇文章演示代码以及资料文档资料下载

下载Word文档到电脑，方便收藏和打印～

下载Word文档

去做题

猜你喜欢

51. Python 数据处理（2）

1.Python 修改excel文件import xlrd import xlutils.copy excelr = xlrd.open_workbook("hello.xlsx") excelw = xlutils.copy.copy(e...

99+

2023-01-31

数据处理 Python
(十三)数据库查询处理之QueryExecution(2)

原文：https://www.cnblogs.com/JayL-zxl/p/14497016.html...

99+

2019-08-24

(十三)数据库查询处理之QueryExecution(2) 数据库入门数据库基础教程
MySQL数据库管理2

te database db102; create table db102.t1(name char(6),age int(3));in...

99+

2022-10-18
详解python日期时间处理2

目录开发中常用的日期操作还有哪些？我们看看这两个模块。时间处理中的类型转换:struct_time vs str时间与字符串转换总结前篇我们稍微学习了Python中时间的获取，这次继...

99+

2022-11-12
python数据预处理

Ⅰ.数据源 Ⅱ.导入库 import pandas as pdimport numpy as npfrom scipy import statsimport matplotlib.pyplot a...

99+

2023-09-21

python 数据分析
python 数据库处理

# -*- coding:utf-8 -*- #!/bin/env python ''' #Auth:karl #Function: ...

99+

2022-10-18
python处理svg数据

我感觉python不能直接处理svg格式，所以想把它转化为png数据。昨天搞了一天，把svg转化为png格式，网上有专门的python插件，百度搜...

99+

2023-01-31

数据 python svg
python数据处理详情

目录一，前言二，python模块2.1，增加停用词表2.2，顺序读取2.3，lambda函数三，运行3.1，存入文件一，前言我们现在拿到了一个十分庞大的数据集。是json文件，里面...

99+

2022-11-10
50. Python 数据处理（1）

今天开始往后都，用python3来写脚本1.csv数据处理csv文件格式：逗号分隔符（csv），有时也称为字符分隔值，因为分隔字符也可以不是逗号，其文件以纯文本的形式存储表格数据（数字和文本）。纯文本意味着该文件是一个字符序列，不含必须像二...

99+

2023-01-31

数据处理 Python
Python如何处理大数据？

Python如何处理大数据？在现代数据处理技术中，大数据处理是一个非常重要的领域。Python作为一种高效、易学、易用的编程语言，也在大数据处理领域中占据着重要的地位。Python可以通过多种方式处理大数据，包括使用Python内置库、第...

99+

2023-10-21

http 大数据对象
Python能否处理大数据？

Python是一门广泛使用的编程语言，其简洁、易学、跨平台等特性，使其成为了数据科学家和工程师的首选语言之一。然而，面对大数据处理，Python能否胜任呢？本文将探讨Python在处理大数据方面的优势和局限性。 Python处理大数据的优势...

99+

2023-10-28

http 大数据自然语言处理
python处理二进制数据

处理二进制数据离不开python的struct模块，struct理解上你可以把它理解为c语言的结构体，使用该模块的pack和unpack方法，可以很容易的把二进制数据转换为常用的类型数据，如整型、字符型等结构体如下： str...

99+

2023-01-31

二进制数 python
python怎么处理json数据

在Python中，可以使用json模块来处理JSON数据。以下是处理JSON数据的一些常见操作：1. 解析JSON数据： ...

99+

2023-10-18

python json
python详解（2）——数据类型与变量

本文为原创作品，若与其他作品雷同，纯属巧合。请勿抄袭。目录 🏆一、前言 🏆二、数据类型and变量 🚩1、数据类型（简单） 🚩2、变量（中等） &#x...

99+

2023-08-31

python 开发语言
第2课 python数据类型与转换

上次说了什么？复习一下吧！！！我们只是学习了print() 函数，print(可以是数字或者 '想打印的内容')，通常print函数在调试也非常好用，不然我们不会第一时间学习。print("你好，世界")，开始今日我们的内容。 pyt...

99+

2023-01-31

数据类型 python
Python数据处理-导入导出excel数据

目录一.xlwt库将数据导入Excel1.将数据写入一个Excel文件2.定制Excel表格样式3.元格对齐4.单元格的背景色5.单元格边框二、xlrd库读取Excel中的数据1.读...

99+

2022-11-13
MySQL表操作：提高数据处理效率的秘诀（进阶）（2）

💕“学习难免有坎坷，重要的是你能尽力而为，持之以恒。”💕 🐼作者：不能再留遗憾了🐼 🎆专栏：MySQL学习🎆 🚗本文章主要内...

99+

2023-08-16

mysql 数据库
2 数据库和表的管理步骤

2.sqlserver 数据库和表的管理防伪码：书山有路勤为径学海无涯苦作舟实验一:实验环境：公司安装了SQL SERVER 2008企业版，现在完成如下任务：1．创建一个名为benet的数...

99+

2022-10-18
python怎么处理表格数据

Python 可以使用多种库来处理表格数据，其中最流行的是 pandas 库。使用 pandas 可以读取、处理和分析表格数据。下面...

99+

2023-09-15

python
如何用Python处理大数据？

Python是一种非常流行的编程语言，它在处理大数据方面表现突出。Python的易用性和灵活性使得它成为了处理大数据的首选语言之一。本文将介绍如何使用Python处理大数据。使用Pandas处理大数据 Pandas是Python中非常...

99+

2023-11-04

大数据二维码自然语言处理