如何安装python中的parsel

发布网友发布时间：2022-03-03 19:49

我来回答

共1个回答

热心网友时间：2022-03-03 21:19

python-parsel

Parsel是一个使用XPath和CSS选择器（可选地与正则表达式结合）从HTML和XML提取数据的库

一、安装

官网：https://pypi.org/project/parsel/

pip安装：pip install parsel 默认安装的是最新版

pip install parsel=1.6.0 目前官方最新版本

PyCharm：File =》Setting =》Project：sintemple =》 Project：Interpreter =》点击右上角的加号（或者按快捷键Alt+Insert）=》在输入框中输入parsel，会出现一个只有parsel的一列，点击选择它 =》Install Package 等待安装完成就可以了（注：其中Specify version选中可以在下拉框中选择版本）

————————————————

二、Selector

Selector(text=None, type=None, namespaces=None, root=None,base_url=None, _expr=None)

创建解析HTML或XML文本的对象

参数：

text 在python2中是一个Unicode对象，在python3中是一个str对象

type 定义Selector类型，可以是"html",“xml"或者是None（默认），如果为None则默认选择为"html”

base_url allows setting a URL for the document. This is needed when looking up external entities with relative paths（允许为文档设置URL。在使用相对路径查找外部实体时，这是必需的）

Selector的对象方法

①. Selector.attrib()

返回基础元素的属性字典

②. Selector.css(query)

css选择器

③. Selector.get()

序列化并以单个unicode字符串返回匹配的节点

④. Selector.getall()

序列化并以第1个元素的unicode字符串列表返回匹配的节点

⑤. Selector.re(self, regex, replace_entities=True)

正则选择器

⑥. Selector.re_first(self, regex, default=None, replace_entities=True)

If the list is empty or the regex doesn’t match anything, return the default value (None if the argument is not provided)如果列表为空或正则表达式不匹配任何东西，返回默认值(如果没有提供参数，则返回’None’ )

⑦. Selector.remove()

Remove matched nodes from the parent for each element in this list.从父节点中删除列表中每个元素的匹配节点。

⑧. Selector.xpath(self, query, namespaces=None, **kwargs)

xpath选择器

SelectorList的对象方法

SelectorList类是内置list类的一个子类，它提供了一些额外的方法。

①. attrib 返回第一个元素的属性字典。如果列表为空，则返回空dict

②. css(query) .css()对该列表中的每个元素调用方法，然后将其结果展平为另一个SelectorList。query 与 Selector.css()

③. extract() 调用.get()此列表中每个元素的方法，并将其结果展平，以unicode字符串列表形式返回。

④. extract_first(default=None) 返回.get()此列表中第一个元素的结果。如果列表为空，则返回默认值。

⑤. get(default=None) 返回.get()此列表中第一个元素的结果。如果列表为空，则返回默认值。

⑥. getall() 调用.get()此列表中每个元素的方法，并将其结果展平，以unicode字符串列表形式返回。

⑦. re(regex, replace_entities=True) 调用.re()此列表中每个元素的方法，并将其结果展平，以unicode字符串列表形式返回。默认情况下，字符实体引用由其对应的字符替换（&和和除外<。以传递replace_entities，False关闭这些替换。

⑧. re_first(regex, default=None, replace_entities=True) 调用.re()此列表中第一个元素的方法，并以Unicode字符串返回结果。如果列表为空或正则表达式不匹配任何内容，则返回默认值（None如果未提供参数）。默认情况下，字符实体引用由其对应的字符替换（&和和除外<。以传递replace_entities，False关闭这些替换。

⑨. remove() 从父级中删除此列表中每个元素的匹配节点。

⑩. xpath(xpath, namespaces=None, **kwargs) .xpath()对该列表中的每个元素调用方法，然后将其结果展平为另一个SelectorList。query 与 Selector.xpath()namespaces是用于将其他前缀添加到已注册的前缀的可选映射（字典）。与相对，这些前缀不会保存以备将来使用。

举例说明：

html代码

————————————————

三、csstranslator

TranslatorMixin

This mixin adds support to CSS pseudo elements via dynamic dispatch.Currently supported pseudo-elements are ::text and ::attr(ATTR_NAME).

①. xpath_attr_functional_pseudo_element(xpath, function)

Support selecting attribute values using ::attr() pseudo-element

②. xpath_element(selector)

③. xpath_pseudo_element(xpath, pseudo_element)

Dispatch method that transforms XPath to support pseudo-element

④. xpath_text_simple_pseudo_element(xpath)

Support selecting text nodes using ::text pseudo-element

XPathExpr(path=’’, element=’*’, condition=’’, star_prefix=False)

GenericTranslator

HTMLTranslator(xhtml=False)

四、utils

extract_regex(regex, text, replace_entities=True)

Extract a list of unicode strings from the given text/encoding using the following policies: * if the regex contains a named group called “extract” that will be returned * if the regex contains multiple numbered groups, all those will be returned (flattened) * if the regex doesn’t contain any group the entire regex matching is returned

flatten(sequence) → list

Returns a single, flat list which contains all elements retrieved from the sequence and all recursively contained sub-sequences (iterables). Examples: >>> [1, 2, [3,4], (5,6)] [1, 2, [3, 4], (5, 6)] >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, (8,9,10)]) [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10] >>> flatten([“foo”, “bar”]) [‘foo’, ‘bar’] >>> flatten([“foo”, [“baz”, 42], “bar”]) [‘foo’, ‘baz’, 42, ‘bar’]

iflatten(sequence) → Iterator

Similar to .flatten(), but returns iterator instead

shorten(text, width, suffix=’…’)

Truncate the given text to fit in the given width.

————————————————

原文链接：网页链接

首页

文章

如何安装python中的parsel