Python内置日志模块源码分析和进阶用法

概述

日志是在程序运行过程中进行事件追踪的方式,通过向程序引入日志模块可以使开发者方便地了解到特定的事件是否发生,以及事件发生时相关的信息。

本文假定你已经了解日志模块的基本使用(如通常博客、教材所介绍),想要定制化拓展你的日志功能(如定制化的日志持久化,在日志中记录特定属性(区分于message)等),或许本文的部分内容会对你有帮助。

本文介绍以下内容:

  1. python中内置的logging模块中关于日志记录的关键类设计(如日志事件对象LogRecord,日志处理类/接口Handler)

  2. 如何将日志记录到数据库中(以MySQL为例)

  3. 如何拓展日志事件记录字段

  4. 如何优雅地使用logging模块

源码分析

logging模块设计

从__all__这一特殊变量即可窥见logging模块有哪些主要的类与变量: Logger、Handler、Filter、Formatter、LogRecord等。

重要常量及其相关操作

日志级别

若干个正整数集合定义了我们在程序中常用的日志级别

模块架构图及日志数据流

当开发者调用logger.info、logger.warning之类的日志记录方法时,日志模块会根据已启用的日志级别(Level)判定如何处理日志事件,如果启用的日志级别低于当前记录的事件级别(如DEBUG低于ERROR),则日志模块会对当前事件进行记录:创建日志记录对象LogRecord,将logger关联的日志过滤器应用到LogRecord对象中,如果该日志应该被过滤,则不作后续处理,如果该日志被被过滤,则交给logger对象的日志处理类(Handler)处理。

LogRecord:日志事件对象

LogRecord类表示一个被记录的事件
LogRecord实例会在需要记录某些东西时被创建,包含需要记录事件的所有相关信息。日志的主要信息在于msg和args,LogRecord实例也包括了实例被创建的时间、创建过程被调用 的源代码模块、文件名、行数等信息。 

Handler: 日志处理类

Handler类的实例负责把日志时间分发到特定的目的地 .

Handler类 作为一个接口,其应该实现的功能为:把日志时间分发到特定的目的地。内置的Handler主要包括StreamHandler、FileHandler,前者可将日志记录以流的形式输出(如输出到控制台);后者可将日志记录持久化到文件中。

class StreamHandler(Handler):
    """
    A handler class which writes logging records, appropriately formatted,
    to a stream. Note that this class does not close the stream, as
    sys.stdout or sys.stderr may be used.
    """
class FileHandler(StreamHandler):
    """
    A handler class which writes formatted logging records to disk files.
    """

Handler可以选择性地使用格式器(Formatter)实例将日志记录格式化为期望的样式,默认情况下只记录原始记录信息(record.message)

默认情况下,logging模块没有将日志持久化到数据库的Handler子类,如果需要实现将日志持久化到数据库中,可以实现一个DBHandler类(继承Handler),重写其emit(record)方法:

    def emit(self, record):
        """
        Do whatever it takes to actually log the specified logging record.

        This version is intended to be implemented by subclasses and so
        raises a NotImplementedError.
        """
        raise NotImplementedError('emit must be implemented '
                                  'by Handler subclasses')

其它Handler

TimedRotatingFileHandler
class TimedRotatingFileHandler(BaseRotatingHandler):
    """
    Handler for logging to a file, rotating the log file at certain timed
    intervals.

    If backupCount is > 0, when rollover is done, no more than backupCount
    files are kept - the oldest ones are deleted.
    """
    def __init__(self, filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False, atTime=None):
        BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
        self.when = when.upper()
        self.backupCount = backupCount
        self.utc = utc
        self.atTime = atTime
        # Calculate the real rollover interval, which is just the number of
        # seconds between rollovers.  Also set the filename suffix used when
        # a rollover occurs.  Current 'when' events supported:
        # S - Seconds
        # M - Minutes
        # H - Hours
        # D - Days
        # midnight - roll over at midnight
        # W{0-6} - roll over on a certain day; 0 - Monday
        #
        # Case of the 'when' specifier is not important; lower or upper case
        # will work.
        if self.when == 'S':
            self.interval = 1 # one second
            self.suffix = "%Y-%m-%d_%H-%M-%S"
            self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}(\.\w+)?$"
        elif self.when == 'M':
            self.interval = 60 # one minute
            self.suffix = "%Y-%m-%d_%H-%M"
            self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}(\.\w+)?$"
        elif self.when == 'H':
            self.interval = 60 * 60 # one hour
            self.suffix = "%Y-%m-%d_%H"
            self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}(\.\w+)?$"
        elif self.when == 'D' or self.when == 'MIDNIGHT':
            self.interval = 60 * 60 * 24 # one day
            self.suffix = "%Y-%m-%d"
            self.extMatch = r"^\d{4}-\d{2}-\d{2}(\.\w+)?$"
        elif self.when.startswith('W'):
            self.interval = 60 * 60 * 24 * 7 # one week
            if len(self.when) != 2:
                raise ValueError("You must specify a day for weekly rollover from 0 to 6 (0 is Monday): %s" % self.when)
            if self.when[1] < '0' or self.when[1] > '6':
                raise ValueError("Invalid day specified for weekly rollover: %s" % self.when)
            self.dayOfWeek = int(self.when[1])
            self.suffix = "%Y-%m-%d"
            self.extMatch = r"^\d{4}-\d{2}-\d{2}(\.\w+)?$"
        else:
            raise ValueError("Invalid rollover interval specified: %s" % self.when)

        self.extMatch = re.compile(self.extMatch, re.ASCII)
        self.interval = self.interval * interval # multiply by units requested
        # The following line added because the filename passed in could be a
        # path object (see Issue #27493), but self.baseFilename will be a string
        filename = self.baseFilename
        if os.path.exists(filename):
            t = os.stat(filename)[ST_MTIME]
        else:
            t = int(time.time())
        self.rolloverAt = self.computeRollover(t)

# ......................

并发安全性设计

由于一个程序中可能有多个线程都会进行日志记录,因此如何保证日志并发写入的安全性是日志模块面临的一个重要问题。logging模块使用threading类的对象实现多线程及线程同步安全机制。如使用RLock作为模块层级的同步安全基础对象,在访问或修改全局变量时会调用_acquireLock(),并在变量修改完成后调用_releaseLock()。

#---------------------------------------------------------------------------
#   Thread-related stuff
#---------------------------------------------------------------------------

#
#_lock is used to serialize access to shared data structures in this module.
#This needs to be an RLock because fileConfig() creates and configures
#Handlers, and so might arbitrary user threads. Since Handler code updates the
#shared dictionary _handlers, it needs to acquire the lock. But if configuring,
#the lock would already have been acquired - so we need an RLock.
#The same argument applies to Loggers and Manager.loggerDict.
#
_lock = threading.RLock()

def _acquireLock():
    """
    Acquire the module-level lock for serializing access to shared data.

    This should be released with _releaseLock().
    """
    if _lock:
        _lock.acquire()

def _releaseLock():
    """
    Release the module-level lock acquired by calling _acquireLock().
    """
    if _lock:
        _lock.release()

数据库持久化拓展(代码)

PROPERTY_EXTRA_ID = "extra_id"



class DBHandler(logging.Handler):
    """将日志内容写入到数据库中"""

    def __init__(self, level=logging.NOTSET):
        """
        初始化数据库
        """
        super().__init__(level=level)
        self.logdao = BusinessDAO(db_config=config.DB_CONFIG_PATH)
        self.logdb = self.logdao.connect_db()
        self.session = self.logdb.get_session()

    def emit(self, record: logging.LogRecord) -> None:
        """
        重写emit方法,将record记录到数据库中
        :param record:
        :return:
        """
        try:
            self.session.add(Log(level_name=record.levelname,
                                 module=record.module,
                                 thread_name=record.threadName,
                                 file_name=record.filename,
                                 func_name=record.funcName,
                                 line_no=record.lineno,
                                 process=record.process,
                                 process_name=record.processName,
                                 content=record.message,
                                 extra_id=getattr(record, PROPERTY_EXTRA_ID, None),
                                 created_time=datetime.datetime.strptime(record.asctime, DATE_FORMAT)
                                 ))
            self.session.commit()
        except Exception as e:
            print("Log to DB failed...", repr(e))

    def __del__(self):
        self.session.close()

设计模式推荐

  • 单例模式

    getLogger(name)针对唯一的name只会返回唯一的对象,如果name对象的Logger未创建,则会返回默认的对象(提前创建好的),若已创建则会返回已有对象

    def getLogger(name=None):
        """
        Return a logger with the specified name, creating it if necessary.
    
        If no name is specified, return the root logger.
        """
        if name:
            return Logger.manager.getLogger(name)
        else:
            return root
    
  • 组合模式

    一个日志记录器(Logger)有多个日志处理类(Handler),多个Handler组合起来作为Logger的一个成员对象(handlers)

    class Logger(Filterer):
        """
        Instances of the Logger class represent a single logging channel. A
        "logging channel" indicates an area of an application. Exactly how an
        "area" is defined is up to the application developer. Since an
        application can have any number of areas, logging channels are identified
        by a unique string. Application areas can be nested (e.g. an area
        of "input processing" might include sub-areas "read CSV files", "read
        XLS files" and "read Gnumeric files"). To cater for this natural nesting,
        channel names are organized into a namespace hierarchy where levels are
        separated by periods, much like the Java or Python package namespace. So
        in the instance given above, channel names might be "input" for the upper
        level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels.
        There is no arbitrary limit to the depth of nesting.
        """
        def __init__(self, name, level=NOTSET):
            """
            Initialize the logger with a name and an optional level.
            """
            Filterer.__init__(self)
            self.name = name
            self.level = _checkLevel(level)
            self.parent = None
            self.propagate = True
            self.handlers = []
            self.disabled = False
            self._cache = {}
    
  • 装饰器模式

    Formatter用于对日志消息的格式进行改变

参考资料

CoolCats
CoolCats
理学学士

我的研究兴趣是时空数据分析、知识图谱、自然语言处理与服务端开发