Python 的 collections 模块强化数据结构

阅读量：4228 次

发布时间：2019-05-26

本文共 5020 字，大约阅读时间需要 16 分钟。

collections 是 Python 内建的一个集合模块，提供了许多有用的集合类。包括许多常见的强化数据结构类。至于为什么会出现强化数据结构，自然是因为一般的元组、字典等可能不太满足一些特定的需要。

Counter

列表元素统计

普通实现

>>> word_list = ["a", "b", "c", "c", "a", "a"]>>> cnt = {}>>> for word in set(word_list):...     cnt[word] = word_list.count(word)... >>> cnt{'b': 1, 'c': 2, 'a': 3}>>> cnt['d']Traceback (most recent call last):  File "
   
    ", line 1, in 
    
     KeyError: 'd'

Counter 实现

>>> from collections import Counter>>> cnt = Counter()>>> word_list = ['a', 'b', 'c', 'c', 'a', 'a']>>> for word in word_list:...     cnt[word] += 1... >>> cntCounter({'a': 3, 'c': 2, 'b': 1})>>> cnt['a']3>>> cnt['d'] # 即使没有 key，也不会报 KeyError 哟，这点和 defaultdict(int) 比较像。0

字符串字符统计

普通实现

>>> word_str = 'hello world'>>> word_list = list(word_str)>>> cnt = {}>>> for word in set(word_list):...     cnt[word] = word_list.count(word)... >>> cnt{'e': 1, 'd': 1, 'h': 1, 'o': 2, 'l': 3, ' ': 1, 'r': 1, 'w': 1}

Counter 实现

>>> from collections import Counter>>> word_str = 'hello world'>>> cnt = Counter(word_str)>>> cntCounter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})>>> Counter({'red': 4, 'blue': 2})Counter({'red': 4, 'blue': 2})>>> Counter(red=4, blue=2)Counter({'red': 4, 'blue': 2})

Counter.elements()

>>> cnt = Counter(red=4, blue=2)>>> cntCounter({'red': 4, 'blue': 2})>>> list(cnt.elements())['red', 'red', 'red', 'red', 'blue', 'blue']

Counter.most_common()

>>> cnt = Counter('hello world')>>> cntCounter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})>>> cnt.most_common()[('l', 3), ('o', 2), ('h', 1), ('e', 1), (' ', 1), ('w', 1), ('r', 1), ('d', 1)]>>> cnt.most_common(3)[('l', 3), ('o', 2), ('h', 1)]

Counter.subtract()

>>> a = Counter(a=4, b=2, c=0, d=-2)>>> aCounter({'a': 4, 'b': 2, 'c': 0, 'd': -2})>>> b = Counter(a=1, b=2, c=-3, d=4)>>> bCounter({'d': 4, 'b': 2, 'a': 1, 'c': -3})>>> a.subtract(b)>>> aCounter({'a': 3, 'c': 3, 'b': 0, 'd': -6})

常用操作

其实转换成 Counter 类型以后，操作和字典差不多。

>>> from collections import Counter>>> cnt = Counter('hello world')>>> cntCounter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})>>> cnt.keys()dict_keys(['h', 'e', 'l', 'o', ' ', 'w', 'r', 'd'])>>> cnt.values()dict_values([1, 1, 3, 2, 1, 1, 1, 1])>>> sum(cnt.values())11>>> dict(cnt){'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}>>> cnt.items()dict_items([('h', 1), ('e', 1), ('l', 3), ('o', 2), (' ', 1), ('w', 1), ('r', 1), ('d', 1)])>>> Counter(dict([('a', 1), ('b', 2), ('c', 3)]))Counter({'c': 3, 'b': 2, 'a': 1})>>> cnt.clear()>>> cntCounter()

deque

deque 是栈和队列的一种广义实现，俗称双端队列。有效内存地以近似 O(1) 的性能在 deque 的两端插入和删除元素，尽管 list 也支持相似的操作，但是在 pop(0) 和 insert(0,v)（会改变数据的位置和大小）上有O(n)的时间复杂度。如果抛却这些细节不顾的话，你把他当成加强版的 list 好像也没啥毛病。

>>> from collections import deque>>> >>> d = deque(['a', 'b', 'c'])>>> ddeque(['a', 'b', 'c'])>>> d.append('d')>>> ddeque(['a', 'b', 'c', 'd'])>>> d.count('b')1>>> d.extend(['e', 'f', 'g'])>>> ddeque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])>>> d.pop()'g'>>> ddeque(['a', 'b', 'c', 'd', 'e', 'f'])>>> d.remove('d')>>> ddeque(['a', 'b', 'c', 'e', 'f'])>>> d.reverse()>>> ddeque(['f', 'e', 'c', 'b', 'a'])# 队列左端操作>>> ddeque(['f', 'e', 'c', 'b', 'a'])>>> d.popleft()'f'>>> ddeque(['e', 'c', 'b', 'a'])>>> d.appendleft('h')>>> ddeque(['h', 'e', 'c', 'b', 'a'])>>> d.extendleft(['i', 'j', 'k'])>>> ddeque(['k', 'j', 'i', 'h', 'e', 'c', 'b', 'a'])# 想想挖掘机的履带，rotate 就不难理解了>>> d.rotate(1)>>> ddeque(['a', 'k', 'j', 'i', 'h', 'e', 'c', 'b'])>>> d.rotate(2)>>> ddeque(['c', 'b', 'a', 'k', 'j', 'i', 'h', 'e'])

defaultdict

defaultdict 对我来说最大的特点就是不会出现 KeyError 错误了，我们可以又回到列表元素统计那块来看看。

列表元素统计

普通实现

>>> word_list = ["a", "b", "c", "c", "a", "a"]>>> cnt = {}>>> for word in word_list:...     if word not in cnt:...             cnt[word] = 1...     else:...             cnt[word] += 1... >>> cnt{'a': 3, 'b': 1, 'c': 2}>>> cnt['d']Traceback (most recent call last):  File "
   
    ", line 1, in 
    
     KeyError: 'd'

defaultdict 实现（没有用 if else 语句去判断哟）

>>> from collections import defaultdict>>> word_list = ["a", "b", "c", "c", "a", "a"]>>> cnt = defaultdict(int)>>> for word in word_list:...     cnt[word] += 1... >>> cntdefaultdict(
   
    , {'a': 3, 'b': 1, 'c': 2})

OrderedDict

见闻知意，就是有顺序的字典，好像没啥特别好解释的了。

>>> from collections import OrderedDict>>> d = {"banana":3,"apple":2,"pear":1,"orange":4}>>> order_dict = OrderedDict(d)>>> order_dictOrderedDict([('banana', 3), ('apple', 2), ('pear', 1), ('orange', 4)])>>> order_dict.keys()odict_keys(['banana', 'apple', 'pear', 'orange'])>>> order_dict.values()odict_values([3, 2, 1, 4])>>> order_dict.items()odict_items([('banana', 3), ('apple', 2), ('pear', 1), ('orange', 4)])# 从后（前）删除元素>>> order_dict.popitem(last=True)('orange', 4)>>> order_dictOrderedDict([('banana', 3), ('apple', 2), ('pear', 1)])>>> order_dict.popitem(last=False)('banana', 3)>>> order_dictOrderedDict([('apple', 2), ('pear', 1)])# 移动元素到末尾>>> order_dictOrderedDict([('apple', 2), ('pear', 1), ('orange', 4)])>>> order_dict.move_to_end('apple')>>> order_dictOrderedDict([('pear', 1), ('orange', 4), ('apple', 2)])

转载地址：http://jijqi.baihongyu.com/

你可能感兴趣的文章