python如何提取xml指定内容

python提取xml内容 python提取内容提取xml指定内容 2023-01-03 18:01:47 109人浏览薄情痞子

Python 官方文档：入门教程 => 点击学习

摘要

目录第一种方法：python操作xml文件提取某个单个字段批量提取某个标签值，并将其写入文本第二种：正则提取xml指定内容方法总结第一种方法：Python操作xml文件随手找了一个

第一种方法：Python操作xml文件

随手找了一个xml文件内容(jenkins相关文件)

<?xml version="1.0" encoding="UTF-8"?>
<!--
The MIT License
Copyright (c) 2004-2009, Sun Microsystems, Inc., Kohsuke Kawaguchi, Tom Huybrechts, id:digerata, Yahoo! Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
-->
 
<WEB-app xmlns="Http://xmlns.jcp.org/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
         version="3.1"
         metadata-complete="true">
  <display-name>Jenkins v2.336</display-name>
  <description>Build management system</description>
 
  <servlet>
    <servlet-name>Stapler</servlet-name>
    <servlet-class>org.kohsuke.stapler.Stapler</servlet-class>
    <init-param>
      <param-name>default-encodings</param-name>
      <param-value>text/html=UTF-8</param-value>
    </init-param>
    <init-param>
      <param-name>diagnosticThreadName</param-name>
      <param-value>false</param-value>
    </init-param>
    <async-supported>true</async-supported>
  </servlet>
 
  <servlet-mapping>
    <servlet-name>Stapler</servlet-name>
    <url-pattern>/*</url-pattern>
  </servlet-mapping>
 
  <filter>
    <filter-name>suspicious-request-filter</filter-name>
    <filter-class>jenkins.security.SuspiciousRequestFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>diagnostic-name-filter</filter-name>
    <filter-class>org.kohsuke.stapler.DiagnosticThreadNameFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>encoding-filter</filter-name>
    <filter-class>hudson.util.CharacterEncodingFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>compression-filter</filter-name>
    <filter-class>org.kohsuke.stapler.compression.CompressionFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>authentication-filter</filter-name>
    <filter-class>hudson.security.HudsonFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>csrf-filter</filter-name>
    <filter-class>hudson.security.csrf.CrumbFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>plugins-filter</filter-name>
    <filter-class>hudson.util.PluginServletFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
 
  <!--
	The Headers filter allows us to override headers sent by the container
	that may be in conflict with what we want.  For example, Tomcat will set
	Cache-Control: no-cache for any files behind the security-constraint
	below.  So if Hudson is on a public server, and you want to only allow
	authorized users to access it, you may want to pay attention to this.
	
	See: http://www.nabble.com/No-browser-caching-with-Hudson- -tf4601857.html
  
  <filter>
    <filter-name>change-headers-filter</filter-name>
    <filter-class>hudson.ResponseHeaderFilter</filter-class>
    <!- The value listed here is for 24 hours.  Increase or decrease as you see 
    fit.  Value is in seconds. Make sure to keep the public option ->
    <init-param>
      <param-name>Cache-Control</param-name>
      <param-value>max-age=86400, public</param-value>
    </init-param>
    <!- It turns out that Tomcat just doesn't want to let
    Go of its cache option.  If you override Cache-Control,
    it starts to send Pragma: no-cache as a backup.
     ->
    <init-param>
      <param-name>Pragma</param-name>
      <param-value>public</param-value>
    </init-param>
  </filter>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.CSS</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.gif</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.js</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.png</url-pattern>
  </filter-mapping>
  -->
 
  <filter-mapping>
    <filter-name>suspicious-request-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>diagnostic-name-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>encoding-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>compression-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>authentication-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>csrf-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>plugins-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
 
  <listener>
    <!-- Must be before WebAppMain in order to initialize the context before the first use of this class. -->
    <listener-class>jenkins.util.SystemProperties$Listener</listener-class>
  </listener>
  <listener>
    <listener-class>hudson.WebAppMain</listener-class>
  </listener>
  <listener>
    <listener-class>jenkins.JenkinshttpsessionListener</listener-class>
  </listener>
 
  <!--
    JENKINS-1235 suggests containers interpret '*' as "all roles defined in web.xml"
    as opposed to "all roles defined in the security realm", so we need to list some
    common names in the hope that users will have at least one of those roles.
  -->
  <security-role>
    <role-name>admin</role-name>
  </security-role>
  <security-role>
    <role-name>user</role-name>
  </security-role>
  <security-role>
    <role-name>hudson</role-name>
  </security-role>
 
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Hudson</web-resource-name>
      <url-pattern>/loginEntry</url-pattern>
      <!--http-method>GET</http-method-->
    </web-resource-collection>
    <auth-constraint>
      <role-name>**</role-name>
    </auth-constraint>
  </security-constraint>
  
  <!-- Disable TRACE method with security constraint (copied from jetty/webdefaults.xml) -->
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Disable TRACE</web-resource-name>
      <url-pattern>/*</url-pattern>
      <http-method>TRACE</http-method>
    </web-resource-collection>
    <auth-constraint />
  </security-constraint>
  
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>other</web-resource-name>
      <url-pattern>/*</url-pattern>
    </web-resource-collection>
    <!-- no security constraint --> 
  </security-constraint>
 
  <login-config>
    <auth-method>FORM</auth-method>
    <form-login-config>
      <form-login-page>/login</form-login-page>
      <form-error-page>/loginError</form-error-page>
    </form-login-config>
  </login-config>
 
 
  <!-- if specified, this value is used as the Hudson home directory -->
  <env-entry>
    <env-entry-name>HUDSON_HOME</env-entry-name>
    <env-entry-type>java.lang.String</env-entry-type>
    <env-entry-value></env-entry-value>
  </env-entry>
 
  <!-- configure additional extension-content-type mappings -->
  <mime-mapping>
    <extension>xml</extension>
    <mime-type>application/xml</mime-type>
  </mime-mapping>
  <!--mime-mapping> commenting out until this works out of the box with JOnAS. See  http://www.nabble.com/Error-with-mime-type%2D-%27application-xslt%2Bxml%27-when-deploying-hudson-1.316-in-jonas-td24740489.html
    <extension>xsl</extension>
    <mime-type>application/xslt+xml</mime-type>
  </mime-mapping-->
  <mime-mapping>
    <extension>log</extension>
    <mime-type>text/plain</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>war</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>ear</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>rar</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>webm</extension>
    <mime-type>video/webm</mime-type>
  </mime-mapping>
 
  <error-page>
    <exception-type>java.lang.Throwable</exception-type>
    <location>/oops</location>
  </error-page>
 
  <session-config>
    <cookie-config>
      <!-- See https://www.owasp.org/index.PHP/HttpOnly for the discussion of this topic in OWASP -->
      <http-only>true</http-only>
    </cookie-config>
    <!-- Tracking mode is managed by WebAppMain.FORCE_SESSION_TRACKING_BY_COOKIE_PROP -->
  </session-config>
</web-app>

提取某个单个字段

# coding=utf-8
"""
    作者：gaojs
    功能：
    新增功能：
    日期：2022/6/2 17:12
"""
import xml.dom.minidom
 
 
dom = xml.dom.minidom.parse('web.xml')
root = dom.documentElement
bond_list = root.getElementsByTagName('filter-name')
 
print(bond_list[0].firstChild.data)

运行结果：

批量提取某个标签值，并将其写入文本

# coding=utf-8
"""
    作者：gaojs
    功能：
    新增功能：
    日期：2022/6/2 17:12
"""
import xml.dom.minidom
 
 
dom = xml.dom.minidom.parse('web.xml')
root = dom.documentElement
filter_list = root.getElementsByTagName('filter-name')
 
# print(filter_list[0].firstChild.data)
 
for bond in filter_list:
    s = bond.firstChild.data
    print(s)
    with open('filter_result.txt', 'a') as fin:
        fin.write(s + '\n')

文件结果：

第二种：正则提取xml指定内容方法

with open('web.xml', mode='r') as fin:
    test = fin.read()
    result = re.findall('<filter-name>(.*?)</filter-name>', test)
    for key in result:
        print(key)
        with open('array/filter_result.txt', 'a') as f:
            f.write(key + '\n')

结果：

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持编程网。

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: python如何提取xml指定内容

本文链接: https://www.lsjlt.com/news/176567.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

本篇文章演示代码以及资料文档资料下载

下载Word文档到电脑，方便收藏和打印～

下载Word文档

去做题

猜你喜欢

python如何提取xml指定内容

目录第一种方法：python操作xml文件提取某个单个字段批量提取某个标签值，并将其写入文本第二种：正则提取xml指定内容方法总结第一种方法：python操作xml文件随手找了一个...

99+

2023-01-03

python提取xml内容 python提取内容提取xml指定内容
python如何提取文本指定内容

要提取文本中的指定内容，你可以使用正则表达式或字符串方法来实现。下面是使用正则表达式提取指定内容的示例代码：pythonimport...

99+

2023-10-18

python
python如何提取字符串指定内容

要提取字符串中的指定内容，可以使用字符串的切片操作或正则表达式。1. 使用切片操作：可以使用字符串的索引和切片操作来提取指定内容。例...

99+

2023-08-20

python
使用python如何提取JSON数据指定内容

目录如何提取JSON数据指定内容假设我们要获取'pic_str'里的数据1、JSON数据为字符串类型2、JSON数据为字典类型如何提取复杂JSON的数据例...

99+

2024-04-02
python如何获取网页指定内容

要获取网页中的指定内容，可以使用Python的requests库来发送HTTP请求，并使用BeautifulSoup库来解析HTML页面。以下是一个示例代码，用于获取网页中的标题：```pythonimport requestsfrom...

99+

2023-08-11

python
python怎么提取字符串指定内容

这篇文章主要介绍了python怎么提取字符串指定内容的相关知识，内容详细易懂，操作简单快捷，具有一定借鉴价值，相信大家阅读完这篇python怎么提取字符串指定内容文章都会有所收获，下面我们一起来看看吧。本文教程操作环境：windows7系统...

99+

2023-06-30
Python如何提取PDF指定内容并生成新文件

小编给大家分享一下Python如何提取PDF指定内容并生成新文件，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！01需求描述数据是一份有286页的上市公司公开年报P...

99+

2023-06-15
如何用Python获取网页指定内容

这篇文章主要介绍“如何用Python获取网页指定内容”，在日常操作中，相信很多人在如何用Python获取网页指定内容问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”如何用Python获取网页指定内容”的疑惑有所...

99+

2023-06-29
Python如何提取Excel内容

这篇文章主要讲解了“Python如何提取Excel内容”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“Python如何提取Excel内容”吧！说实话，Python操作excel的库很多，但是我...

99+

2023-06-15
怎么使用python提取JSON数据指定内容

本篇内容介绍了“怎么使用python提取JSON数据指定内容”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！python提取JSON数据指定内...

99+

2023-07-02
Python提取PDF指定内容并生成新文件

在之前的Python办公自动化案专题中，我们已经介绍了如何有选择的提取某些页面进行合并。但是很多时候，我们并不会预知希望提取的页号，而是希望将包含指定内容的页面提取合并为新PDF，...

99+

2024-04-02
如何使用hadoop来提取文件中的指定内容

这篇文章将为大家详细讲解有关如何使用hadoop来提取文件中的指定内容，小编觉得挺实用的，因此分享给大家做个参考，希望大家阅读完这篇文章后可以有所收获。一、需求把以下txt中含“baidu”字符串的链接输出到一个文件，否则输出到另外一个文件...

99+

2023-06-15
Python如何获取指定开头指定结尾所夹中间内容

本篇内容介绍了“Python如何获取指定开头指定结尾所夹中间内容”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！需求获取文章中指定开头、指定结...

99+

2023-07-05
php如何获取api接口指定内容

要获取API接口的指定内容，可以使用PHP中的curl函数。以下是一个示例代码，演示如何使用curl函数从API接口中获取指定内容：...

99+

2023-08-25

php
python怎么爬取网页内的指定内容

要爬取网页内的指定内容，可以使用Python中的第三方库，如BeautifulSoup和Requests。首先，需要安装这两个库。使...

99+

2023-08-08

python
Python如何提取Excel文本框内容

这期内容当中小编将会给大家带来有关Python如何提取Excel文本框内容，文章内容丰富且以专业的角度为大家分析和叙述，阅读完这篇文章希望大家可以有所收获。说实话，这个需求头一次碰到，我相信对于大多数朋友来说，也是头一次碰到。“提取exce...

99+

2023-06-15
Python如何获取文本特定内容

要获取文本中的特定内容，可以使用字符串的一些方法。以下是几个实现的例子：1. 使用`find()`方法找到特定内容的起始索引，然后使...

99+

2023-08-18

Python
excel批量提取指定内容的步骤是什么

批量提取指定内容的步骤如下：1. 打开Excel文件，选择包含要提取内容的工作表。2. 确定要提取的内容的位置。这可以是一个单元格、...

99+

2023-09-11

excel
python如何通过正则匹配指定字符开头与结束提取中间内容

目录一、提取包含始末字符二、不包含始末字符串三、.*和.+正则提取的区别四、起始有无^的区别五、pandas对具体列的内容通过正则表达式进行数据提取六、遇到的报错参考文章：一、提取包...

99+

2023-02-20

python正则匹配提取中间内容 python正则匹配字符开头 python正则匹配
如何在Python中提取字符串的内容

今天就跟大家聊聊有关如何在Python中提取字符串的内容，可能很多人都不太了解，为了让大家更加了解，小编给大家总结了以下内容，希望大家根据这篇文章可以有所收获。Python主要用来做什么Python主要应用于：1、Web开发；2、数据科学研...

99+

2023-06-15