python 内存分页 python内存

转载

jordana 2023-08-28 16:02:22

文章标签 python 内存分页 Python内存缓存 Python ci 文章分类 Python 后端开发

Python中的堆栈内存

以我的理解：

python解释器中也开辟了堆栈，栈是用于存放指令集的，而堆是存放变量的

Python内存分配

以512字节为阙值分为大小对象，大对象直接分配内存，小对象使用专用内存分配器。小对象按固定长度对齐后，再分成不同类别，以便于复用和管理。

首先，向系统内存申请大块Arena内存，按页大小将其分成多个Pool快，这是一级重用单元，每个Pool为一种类别的对象提供内存。Pool被回收后，可重新为另一类别的对象服务

其次，按指定类别大小，将Pool在此分成多个Block块。每个Block块可存一个相同类别的小对象，这是二级重用

按我的理解就是：内存分配其实就是堆内存的分配，按照512字节为界限大对象直接分配小对象切分后以8字节一段分类存储

至于具体的实现逻辑——原谅我看不懂C的代码

Python的回收机制

引用技术

每个对象头部都有一个引用计数器，引用该对象时，技数增加，反之减少。当技术归零时，立即调用回收函数清理并释放内存

标记清除

为了防止循环引用问题，将对象头部加一个Go链表的数据结构，将所有的数据都串起来，做一个检测装置。如果检测到出现循环引用（即删除对象后引用技术＞0）将其回收

分代清除

也有说叫隔代的，忘了=。= 将对象分为3代都有不同的阙值和计数器，每一次经过计算对象技术引用计数，将它们分为可达和不可达两类，所谓可达（存活）对象会移到高一代，不可达对象被释放

Python每个对象占用多少内存

sys.getsizeof(obj)但他不计算object instance的attribute value的大小，因为都是reference它调用的实际是

magic function: sizeof(self)
写个程序，计算一下各种对象的大小：

>>> class A(object): pass
>>> class B: pass
>>> import sys
>>> for x in (None, 1, 1.2, 'c', [], (), {}, set(), B, B(), A, A()):
...     print ("{0:20s}\t{1:d}".format(type(x).__name__, sys.getsizeof(x)))
...
NoneType                16
int                     28
float                   24
str                     50
list                    64
tuple                   48
dict                    240
set                     224
type                    1056
B                       56
type                    1056
A                       56

注意到：tuple=48，而dict=240，是tuple的5倍所以new style class的__slots__就能看出节省多少空间但使用__slots__后，只是空间的节省，性能会有损失，因为在lookup attribute时，如果有dict，那就是O(1)，而使用__slots__后，变成了O(n)。因此，__slots__只能用在属性非常少的环境

__slots__是规定当前类可对外访问的属性

__all__是规定当前模块对外可调用的成员

引用计数为-1

当内存泄漏的时候引用计数可能不为0，其实我一直以为Python中没有内存泄漏的，因为高阶语言的话GC都挺完善的，然而，还是太年轻了

如果没有禁用垃圾回收，那么Python中的内存泄露有两种情况：

要么是对象被生命周期更长的对象所引用，比如global作用域对象；
要么是循环引用中存在del

_cache = []

class OBJ(object):
    pass


def func_to_leak():
    o = OBJ()
    _cache.append(o)

    if True: 
        return
    _cache.remove(o)

# 将对象添加到全局的列表中，我们知道全局的对象一般只有当其执行结束才会被清除， 这就属于内存泄漏

class ConnectionHandler(object):
    def __init__(self, connection):
        self._conn = connection


class Connection(object):
    def __init__(self):
        self._conn_handler = ConnectionHandler(self)
# 循环引用

单利模式

唯一一个送分题一激动搞成了送命题

class Foo(object):
    _instence = None
    def __new__(cls):
        if _instance == None:
            cls._instance = super(Foo).__new__(cls)
        return cls._instance

LRU

全称是Least Recently Used 最近最少使用

LRU算法的设计原则是：如果一个数据在最近一段时间没有被访问到，那么在将来它被访问的可能性也很小。也就是说，当限定的空间已存满数据时，应当把最久没有被访问到的数据淘汰。

它的使用场景是：在有限的空间中存储对象时，当空间满时，会按一定的原则删除原有的对象.常用的原则（算法）有LRU，FIFO，LFU等

from collections import OrderedDict


class LRUCache(OrderedDict):

    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key, **kwargs):
        if self.cache.get(key, None):
            value = self.cache.pop(key)
            self.cache[key] = value
        else:
            value = None

        return value

    def set(self, key, value):
        if self.cache.get(key, None):
            value = self.cache.pop(key)
            self.cache[key] = value
        else:
            if len(self.cache) == self.capacity:
                self.cache.popitem(last=False)  
                self.cache[key] = value
            else:
                self.cache[key] = value

当然在python3.6中也有LRU算法的缓存机制

我用有道翻译成了中文

_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
def lru_cache(maxsize=128, typed=False):
    """
        最近最少使用缓存装饰。
        如果maxsize被设置为None，LRU特性将被禁用，并且缓存可以不受限制地增长。
        如果类型是正确的，不同类型的参数将被单独缓存。例如，f（3.0）和f（3）将被视为不同的调用，并具有不同的结果。
        对缓存函数的参数必须具有可洗性。使用f.cache info（）查看名为tuple（命中、遗漏、maxsize、currsize）的缓存统计信息。
        使用f.cache Clear（）清除缓存和统计信息。用f包装来访问底层函数。        见:http://en.wikipedia.org/wiki/Cache_algorithms Least_Recently_Used
    """

    # 用户只能通过其公共API访问lrucache：
    #       cache_info, cache_clear, and f.__wrapped__
    # lrucache的内部结构被封装在线程安全中
    # 允许实现更改（包括可能的C版本）。

    # 在没有任何参数的情况下，及早发现对@lrucache的错误调用
    # 结果是内部函数被传递给maxsize而不是
    # 整数或没有
    if maxsize is not None and not isinstance(maxsize, int):
        # 如果maxsize 设置的类型不是int 则抛出异常
        raise TypeError('Expected maxsize to be an integer or None')

    def decorating_function(user_function):
        # 这里是主要处理逻辑，在下面（没有进行代码的删除，会比较长）
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)  # 更新闭包函数

    return decorating_function

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # 所有lru缓存实例共享的常量
    sentinel = object()          # 特殊标记用来表示缓存未命中
    make_key = _make_key         # 根据函数参数生成缓存的key

    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # 链表的各个域

    cache = {}
    hits = misses = 0
    full = False
    cache_get = cache.get    # 绑定方法去查找key 如果没有返回None
    lock = RLock()           # 因为链表更新不是线程安全的所以需要加锁
    root = []                # root是一个环形双向链表
    root[:] = [root, root, None, None]     # 初始化节点

    if maxsize == 0:

        def wrapper(*args, **kwds):
            # 没有缓存——只是在成功调用之后的统计更新
            nonlocal misses
            result = user_function(*args, **kwds)  # 被装饰的函数执行后的返回值
            misses += 1
            return result

    elif maxsize is None:

        def wrapper(*args, **kwds):
            # 简单的缓存，没有顺序或大小限制
            nonlocal hits, misses
            key = make_key(args, kwds, typed)  # 从可选的位置和关键字参数中创建一个缓存键
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            result = user_function(*args, **kwds)
            cache[key] = result
            misses += 1  
            return result

    else:

        def wrapper(*args, **kwds):
            #用于跟踪访问的大小限制缓存
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)  # 缓存命中
                if link is not None:
                    # 将被访问的节点移动到环形列表前面
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    # 到达这里意味着这个key被缓存
                    # 锁被释放。
                    # 也就是这个节点已经移动了，缓存也更新了，我们只需要返回
                    # 计算结果并更新遗漏的计数。
                    pass
                elif full: # 新增缓存结果，一处访问频率低的节点
                    # 下面的操作是使用root当前指向的节点存储key 和 result
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    # 接下来将原 root 指向的下一个节点作为新的 root
                    # 同时将新 root 节点的key 和 result 清空
                    # 这样使用频率最低的节点结果就从缓存中移除了
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    oldresult = root[RESULT]
                    root[KEY] = root[RESULT] = None
                    # Now update the cache dictionary.
                    del cache[oldkey]
                    # Save the potentially reentrant cache[key] assignment
                    # for last, after the root and links have been put in
                    # a consistent state.
                    cache[key] = oldroot
                else:  # 仅更新缓存节点
                    # 新增节点插入到 root 节点的前面
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    full = (len(cache) >= maxsize)
                misses += 1
            return result

    def cache_info():
        """缓存统计数据"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, len(cache))

    def cache_clear():
        """清除缓存和缓存统计信息"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

总结下：

如果使用LRU缓存机制，它会创建一个双向环形链表，通过在缓存命中时，将节点移动到队列的前边的方式，从而间接地记录了最近经常访问的节点。当缓存空间满了后，会自动“移除”位于环形队列尾部最近命中频率最低的节点，从而为新增缓存节点腾出了空间。

使用：

from functools import lru_cache


@lru_cache(5)
def demo(i):
    print('demo', i)
    return i

for i in range(5):
    demo(i)

for i in range(5):
    demo(i)

输出结果
demo 0
demo 1
demo 2
demo 3
demo 4

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。