Python 操作 Prometheus：prometheus-client 完整教程与实战指南-猿码集

安装与环境准备

安装 prometheus-client

在 Python 项目中接入 Prometheus 的第一步是安装 prometheus-client。通过 pip 安装即可快速完成依赖引入，并确保与你的 Python 版本兼容。本文围绕 Prometheus 的 Python 客户端 prometheus-client 的完整教程与实战指南展开，帮助你在实际项目中落地监控。

为了避免跨项目依赖冲突，强烈建议在虚拟环境中执行安装。隔离环境有助于稳定性与可重复性，也方便后续对依赖进行版本锁定。

pip install prometheus-client

运行环境与依赖

确保运行环境具备稳定的网络与持续运行能力，优先使用虚拟环境或容器化部署，并将依赖版本用 requirements.txt 或 Pipfile.lock 锁定。

如果你的应用将部署在容器中，请在 Dockerfile 中固定基础镜像版本，并考虑对 Prometheus 端点的暴露路径进行明确配置。一致的运行环境有助于避免“在我机器能跑”的尴尬。

快速验证

为了快速验证 prometheus-client 的基本能力，可以创建一个简单的脚本来暴露指标并访问 /metrics，确保监控端点可用。

下面的示例用一个计数器作为演示，启动一个本地 HTTP 服务器后持续增加指标值。确保在目标端口能看到 /metrics 的公开数据。

from prometheus_client import start_http_server, Counter
import timeREQUEST_COUNTER = Counter('quick_test_requests_total', 'Total requests')if __name__ == '__main__':start_http_server(8000)while True:REQUEST_COUNTER.inc()time.sleep(1)

Prometheus client Python 基础用法

创建与导出度量

Prometheus 客户端提供多种指标类型，常用的包括 Counter、Gauge、Summary、Histogram。Counter 只能递增，适用于请求计数、错误次数等场景；Gauge 可以上升和下降，适合当前并发、队列长度等指标；Summary 和 Histogram 用于分布信息和分位值分析。

下面给出一个包含四种指标的示例，帮助你了解如何在应用中定义与使用。

from prometheus_client import Counter, Gauge, Summary, Histogram, start_http_server
import random, time# 指标定义
REQUEST_COUNT = Counter('example_requests_total', 'Total requests')
RESPONSE_TIME = Histogram('example_response_seconds', 'Response time in seconds', buckets=[0.1,0.2,0.5,1,2,5])
CURRENT_USERS = Gauge('example_current_users', 'Current number of users')if __name__ == '__main__':start_http_server(8001)CURRENT_USERS.set(5)while True:REQUEST_COUNT.inc()with RESPONSE_TIME.time():time.sleep(random.uniform(0.05, 0.3))  # 模拟处理时间CURRENT_USERS.inc()time.sleep(1)CURRENT_USERS.dec()

暴露指标端点

指标的暴露通常通过 HTTP 端点实现。最简单的做法是使用 start_http_server，直接在应用中暴露 /metrics。也可以将 prometheus_client 与 Web 框架结合，以便在现有应用结构中集成监控端点。

以下给出两种常见方案：一个是纯 HTTP Server，另一个是与 Flask 集成的示例。确保 /metrics 能被 Prometheus 抓取。

# 方案 A: 纯 HTTP Server
from prometheus_client import start_http_serverstart_http_server(8000)
# 在此处运行应用逻辑...# 方案 B: 与 Flask 集成
from flask import Flask
from prometheus_client import Counter, generate_latest, CONTENT_TYPE_LATEST
app = Flask(__name__)
REQUESTS = Counter('flask_requests_total', 'Total requests')@app.route('/metrics')
def metrics():return generate_latest(), 200, {'Content-Type': CONTENT_TYPE_LATEST}

# 方案 C: 使用组合式暴露
from prometheus_client import make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
from flask import Flaskapp = Flask(__name__)
# 你的其它路由
# 将 /metrics 暴露给 Prometheus
application = DispatcherMiddleware(app.wsgi_app, {'/metrics': make_wsgi_app()
})

直接推送到 Pushgateway（高级）

如果你的任务是批处理作业、短生命周期进程等场景，可以使用 Pushgateway 将指标数据主动推送给 Prometheus，而非暴露端点。在高并发或短任务场景下，Pushgateway 提供了灵活的聚合入口。

下面给出一个简单的推送示例：

from prometheus_client import CollectorRegistry, push_to_gateway, Gauge
registry = CollectorRegistry()
g = Gauge('job_running_seconds', 'Time spent running job', registry=registry)# 更新指标
g.set_to_current_time()
# 任务完成后推送
push_to_gateway('127.0.0.1:9091', job='batch_job', registry=registry)

进阶：自定义指标与聚合

使用指标标签

通过标签（labels）可以对同一指标的不同维度进行聚合统计，例如按 HTTP 方法和端点分组的请求次数。务必为需要聚合的维度定义标签，并在上报时传入对应的标签值。

下面的示例展示了如何定义带标签的指标，以及如何为不同组合上报数据。

from prometheus_client import Counter, start_http_server# 指标带标签
REQUESTS = Counter('http_requests_total', 'HTTP requests total', ['method', 'endpoint'])if __name__ == '__main__':start_http_server(8002)# 模拟不同请求REQUESTS.labels('GET', '/api/v1/items').inc()REQUESTS.labels('POST', '/api/v1/items').inc()REQUESTS.labels('GET', '/api/v1/items').inc()

记录复杂分布信息

Summary 和 Histogram 适合记录分布信息，以便后续进行分位分析和容量规划。Histogram 带有桶（buckets），适合对响应时间分布进行细粒度分析；Summary 则直接提供全局的分位统计，但在聚合时对并行性有不同的影响。

通过装饰器或上下文管理可以方便地对关键路径应用时长记录。

from prometheus_client import Summary, start_http_server
import timeREQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')@REQUEST_TIME.time()
def process():time.sleep(0.2)if __name__ == '__main__':start_http_server(8003)while True:process()

实战指南：在应用中落地 Prometheus

与 Flask/FastAPI 集成

在 Web 框架中集成 Prometheus 指标通常需要在路由前后对请求进行计数，在某些场景还需要暴露指标端点，确保 Prometheus 能够抓取。将监控放在应用入口点，可以捕捉到全局的请求统计信息。

下面给出一个在 Flask 应用中统计入口请求的简单示例，以及一个 /metrics 端点的实现。你也可以使用 make_wsgi_app 将指标端点无缝整合到现有应用中。

from flask import Flask, request
from prometheus_client import Counter, generate_latest, CONTENT_TYPE_LATESTREQUEST_COUNTER = Counter('flask_incoming_requests_total', 'Total incoming requests', ['method','endpoint'])app = Flask(__name__)@app.before_request
def before():REQUEST_COUNTER.labels(request.method, request.path).inc()@app.route('/metrics')
def metrics():return generate_latest(), 200, {'Content-Type': CONTENT_TYPE_LATEST}

容器化与部署

在多进程或容器化环境下，默认的 prometheus_client 端点在多进程下会出现数据不一致的问题。推荐使用多进程模式（multiprocess）并指向一个共享目录用于聚合，确保全局视图的一致性。

一般做法是在容器部署中设置 PROMETHEUS_MULTIPROC_DIR 指向一个共享卷，然后在进程启动时初始化多进程收集器。示例配置如下所示，结合 Gunicorn 等进程管理器使用效果最佳。

# 在 Gunicorn 场景下
import os
from prometheus_client import start_http_server, multiprocess, CollectorRegistry# 指定多进程聚合目录（共享卷）
PROMETHEUS_MULTIPROC_DIR = os.environ['PROMETHEUS_MULTIPROC_DIR']
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)start_http_server(8004)

常见问题与性能注意点

在高并发场景下，确保指标上报不会成为瓶颈。避免在紧密循环中进行耗时操作，使用异步或后台任务来更新指标；同时关注多线程环境下的指标对象的线程安全性。

如果你的应用是多进程/多实例部署，考虑使用 Prometheus 的拉取模式，避免在每个实例中重复导出大量聚合数据，并在中心化的 Prometheus 服务中聚合视图。

Python 操作 Prometheus：prometheus-client 完整教程与实战指南