O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Dive into sentry

479 visualizações

Publicada em

Pycon China 2015

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Dive into sentry

  1. 1. Dive into Sentry The modern error logging and aggregation platform XTao 09.19.2015 Beijing
  2. 2. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 徐涛 ● @ 豆瓣 ● (?:产品开发|运维)工程师 ● (?:CODE|DevOps|Git|Python) ● 2014 PyConChina Beijing ● https://blog.xtao.me ❏ Douban: @xtaooooo ❏ Twitter: @xtao ❏ Github: @xtao
  3. 3. Sentry 概述 Sentry 毕业于 Disqus https://engineering.disqus.com/ Sentry 历史 Sentry 是什么 DEMO
  4. 4. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 起源 ● 2010 ● http://disqus.com/ ● django-db-log (祖父) ● tl;dr Sentry and Raven are StarCraft 2(星际争 霸 2) units. ● driven-by-open-source commit 3c2e87573d3bd16f61cf08fece0638cc47a4fc22 Author: David Cramer <dcramer@gmail.com> Date: Mon May 12 16:26:19 2008 +0000 initial working code djangodblog/__init__.py | 35 +++++++++++++++++++++++++++++++++++ djangodblog/models.py | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+)
  5. 5. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 5 ● 2012 ● Protocol Version 3 ● branch: 5.4.x-maint
  6. 6. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 6 ● 2013 ● Protocol Version 4 ● Protocol Version 5 ● Alerts ● Filters
  7. 7. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 7 ● 2014 ● Organizations ● TSDB ● Rules ● Web API ● Protocol Version 6 ● BIGINT ● Help Pages
  8. 8. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Senty 8 ● 2015 ? ● Most of the application has been overhauled and rewritten on top of React and our web API. ● beta
  9. 9. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 是什么 ● 一个错误记录和汇聚平台 ○ Server: Sentry (The Sentry Open Source Server) ○ Client: The Raven Clients.
  10. 10. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 为什么要用 Sentry ● 详细的错误息 ○ 某一行代码 (Python) ○ 某一个变量 (Python) ● 详细错误分类 ○ Tag ● 提醒 ● 合理的重复错误处理 ● 支持多种语言 ○ 对 Python 支持好 ❏ 额外的收获 ❏ 入门 ❏ 一个很好的 Django 项目,如果你要 学习如何使用 Django 的话,可以读 一下 Sentry 的源码 ❏ 进阶 ❏ Sentry 应该算是一个中型 Web 项目 了,如果你缺少 Web 项目开发经 验,也可以从源码中获取一些经验 ❏ 开源 ❏ 一个 Python 开源 Web 应用的示 例,数据迁移还是靠谱的
  11. 11. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry - 服务端(7.x) ● Backend ○ Python ○ Django ○ Celery ● Frontend ○ JQuery ○ Backbone ○ Underscore ○ Bootstrap ○ Moment ● Database ○ MySQL ○ PostgreSQL ● KV ○ Cassandra ○ Riak ○ Redis ● Queue ○ Redis ○ RabbitMQ
  12. 12. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven - 官方支持的 Client ● Python ● JavaScript ● Node.js ● PHP ● Ruby ● Objective-C ● Java ● C# ● Go
  13. 13. DEMO 1. Hosted Sentry a. https://www.getsentry. com/signup/ b. 14-day Free Trial 2. Sentry On Promise a. https://docs.getsentry. com/on- premise/server/installati on/ b. Sentry Internal
  14. 14. Sentry 使用 如何提交错误 Raven DSN
  15. 15. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven 101 pip install raven --upgrade from raven import Client client = Client('___DSN___') try: 1 / 0 except ZeroDivisionError: client.captureException() def handle_request(request): client.context.merge({'user': { 'email': request.user.email }}) try: ... finally: client.context.clear()
  16. 16. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven 102 ● WSGI middleware ● raven/middleware.py ``` A WSGI middleware which will attempt to capture any uncaught exceptions and send them to Sentry. >>> from raven.base import Client >>> application = Sentry (application, Client()) ```
  17. 17. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! DSN 101 '{PROTOCOL}://{PUBLIC_KEY}:{SECRET_KEY}@{HOST}/{PATH}{PROJECT_ID}' http://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru.intra.douban.com/10 udp://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru-udp.intra.douban.com: 4008/10
  18. 18. Sentry 特性 (๑•̀ㅂ•́)‫✧و‬ (つд⊂) Event Group Protocol Interface TSDB Buffer Cache
  19. 19. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Event ● HTTP(DATA) ● UDP(DATA) ● EventManager ● Project ● Event ● Group ● EventMapping ○ event_id: uuid.uuid4(). hex ● UserReport ○ 用户反馈 Sentry 问题 ● post_process_group.delay ● index_event.delay
  20. 20. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Group ● hashes ○ checksum (provided by client) ○ fingerprint / (default + fingerprint) ○ default (first interface ordered by score) ● find group ○ find group at GroupHash by hash ○ first matched group ● sample event (count, time) ● regression (resolved event)
  21. 21. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Protocol ● CLIENT_RESERVED_ATTRS = ( ● 'project', ● 'event_id', ● 'message', ● 'checksum', ● 'culprit', ● 'fingerprint', ● 'level', ● 'time_spent', ● 'logger', { "event_id": "fc6d8c0c43fc4630ad850ee518f1b9d0", "culprit": "my.module.function_name", "timestamp": "2011-05-02T17:41:36", "message": "SyntaxError: Wattttt!" "sentry.interfaces.Exception": { "type": "SyntaxError": "value": "Wattttt!", "module": "__builtins__" } } ● 'server_name', ● 'site', ● 'timestamp', ● 'extra', ● 'modules', ● 'tags', ● 'platform', ● 'release', ● )
  22. 22. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Protocol "AUTHnnDATA" ● AUTH ○ "Sentry key=value, key=value, …" ● DATA ○ json string ○ zlib ○ base64
  23. 23. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! HTTP Protocol ● 用户认证跟 Web 复用了 ● /api/store ● GET/POST DATA
  24. 24. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface An interface is a structured representation of data, which may render differently than the default ``extra`` metadata in an event. ● to_python ● get_api_context ● to_json ● get_path ● get_alias ● get_hash ● get_score ● ...
  25. 25. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Exception ● 标准的 Python 异常 ● type, value, module ● stacktrace == sentry.interfaces. Stacktrace >>> { >>> "type": "ValueError", >>> "value": "My exception value", >>> "module": "__builtins__" >>> "stacktrace": { >>> # see sentry.interfaces.Stacktrace >>> } >>> }
  26. 26. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Message ● message (<= 1000) ● params >>> { >>> "message": "My raw message with interpreted strings like %s", >>> "params": ["this"] >>> }
  27. 27. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - HTTP ● 常用的 HTTP 参数 >>> { >>> "url": "http://absolute.uri/foo", >>> "method": "POST", >>> "data": { >>> "foo": "bar" >>> }, >>> "query_string": "hello=world", >>> "cookies": "foo=bar", >>> "headers": { >>> "Content-Type": "text/html" >>> }, >>> "env": { >>> "REMOTE_ADDR": "192.168.0.1" >>> } >>> }
  28. 28. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Query ● 用于记录 SQL >>> { >>> "query": "SELECT 1" >>> "engine": "psycopg2" >>> }
  29. 29. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Template ● A rendered template (generally used like a single frame in a stacktrace). ● The attributes ``filename``, ``context_line``, and ``lineno`` are required. >>> { >>> "abs_path": "/real/file/name.html" >>> "filename": "file/name.html", >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }
  30. 30. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - User ● 定义一个用户 >>> { >>> "id": "unique_id", >>> "username": "my_user", >>> "email": "foo@example.com" >>> "ip_address": "127.0.0.1", >>> "optional": "value" >>> }
  31. 31. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Stacktrace ● Python Frame >>> { >>> "frames": [{ >>> "abs_path": "/real/file/name.py" >>> "filename": "file/name.py", >>> "function": "myfunction", >>> "vars": { >>> "key": "value" >>> }, >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "in_app": true, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }], >>> "frames_omitted": [13, 56] >>> }
  32. 32. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! TSDB - 时序数据库 ● Dummy (none) ● InMemory (defaultdict) ● Redis (hashes) Redis: { "TSDBModel:epoch:shard": { "Key": Count } } # rollups must be ordered from highest granularity to lowest SENTRY_TSDB_ROLLUPS = ( # (time in seconds, samples to keep) (10, 360), # 60 minutes at 10 seconds (3600, 24 * 7), # 7 days at 1 hour (3600 * 24, 60), # 60 days at 1 day )
  33. 33. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! NodeStore - KV 数据库 ● riak ● cassandra ● django (node table) ● 用于和数据一起存储一些特殊信息(比如不 适合存在数据库里的大文本等) ● validate ● create ● delete ● delete_multi ● get ● get_multi ● set ● set_multi ● generate_id ● cleanup
  34. 34. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Example class Event(Model): """ An individual event. """ __core__ = False ... time_spent = BoundedIntegerField(null=True) data = NodeField(blank=True, null=True)
  35. 35. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Cache ● django ○ filesystem ○ memcached ○ local memory ○ dummy ● redis ● set ● get ● delete ● redis: ○ from nydus.db import create_cluster ○ 支持 cluster ○ 重写了 rb,但还没有在已发布的版本里使用
  36. 36. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Example ● Cache 与 Model ○ db/models/manager.py ○ class BaseManager(Manager) ● get_from_cache ● updated by signal ● deleted by signal
  37. 37. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Buffer This is useful in situations where a single event might be happening so fast that the queue can't keep up with the updates. ● InProcess (no buffer) ● Redis ● 降低 MySQL 数据的 QPS (写) ● 支持 Cluster Redis ● Redis 2.6.12 or newer
  38. 38. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Buffer Internal ● 生产者 ● incr ● ● 'b:k:%s:%s' (hashmap, key_expire = 60 * 60 # 1 hour) ○ 'm' ○ 'f' ○ 'l+%s' ○ 'e+%s' ● 'b:p' (Sorted sets) ● 消费者 ● process pending ● process 'flush-buffers': { 'task': 'sentry.tasks.process_buffer.process_pending', 'schedule': timedelta(seconds=10), 'options': { 'expires': 10, 'queue': 'counters-0', } },
  39. 39. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Membership Roles ● Member *:read ● Admin *:write ● Owner *:delete Scoping has access to all teams ● 跟 GitHub 类似的组织结构以及权限控制 ● Organization - Owner, Admin, Member ● Team - (Role, Project) ● Project
  40. 40. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sensitive Data ● 'password', ● 'secret', ● 'passwd', ● 'authorization', ● 'api_key', ● 'apikey', ● 'access_token', ● DEFAULT_SCRUBBED_FIELDS
  41. 41. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Notifications ● Rules ○ An event is first seen (the first event in a rollup) ○ An event changes state from resolved to unresolved ● State ○ Unresolved ○ Resolved ○ Muted ● Condition ● Action
  42. 42. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Tagging Events ● Event 分类 ● We’ll automatically index all tags for an event, as well as the frequency and the last time a value has been seen. ● TagValue ● GroupTagValue ● Added by buffer
  43. 43. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Rollups & Sampling ● Rollups ○ Raven.captureException(ex, {fingerprint: ['my', 'custom', 'fingerprint']}) ○ Raven.captureException(ex, {fingerprint: ['{{ default }}', 'other', 'data']}) ● Sampling ○ Count ○ Time
  44. 44. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Web Profile ● ?prof=1 ● DEBUG ● super user ● src/sentry/utils/debug.py def can(self, request): if 'prof' not in request.GET: return False if settings.DEBUG: return True if hasattr(request, 'user') and request.user.is_superuser: return True return False
  45. 45. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry to Sentry ● 自举 ● DISABLE_RAVEN ● default: project id == 1 ● src/sentry/utils/raven.py
  46. 46. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! PostgreSQL & Gevent ● psycopg2 ● src/sentry/utils/gevent.py ● Sentry 官方使用的应该是这个数据库,有非阻塞的 patch,支持异步
  47. 47. Sentry @douban 有料 其中充斥着不少嘈点 问题 部署 监控 调优 Tips
  48. 48. “自己解决不了的 问题,不要指望工 具能帮你解决” 乔治@豆瓣
  49. 49. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Monitor
  50. 50. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 豆瓣 ● 应用 ○ Python (大部分) ○ Javascript (前端) ○ Go (少部分) ○ C++/C/Java (少量) ● 错误 ○ devtools (DIY, 已废弃) ○ Sentry
  51. 51. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 问题 ● 已经部署了一套 Sentry ● 5.x 使用 UDP 协议 ● 测试以及线上,有丢错误情况 ● 且比较明显 ● 但是这时还没有针对 Sentry 的监控 ● 开始研究黑盒 ● UDP Worker CPU 使用率比较高 ● UDP 是用 DNS 做负载均衡 ● DNS 因为 cache 问题,导致负载不平衡 ● 使用 Random 改善了 cache 带来的隐患 ● 查看 Worker 代码 ● Gevent / Eventlet 使用错误,没有 Monkey Patch ● 5.x 数据库压力较大,需要做合并写
  52. 52. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 升级 ● 5.x-maint support UDP ● but 7.x not ● We have to backport UDP to 7.x ● 幸好原来的接口还在 ❏ src/sentry/conf/server.py: ❏ #socket.setdefaulttimeout(5) ❏ src/sentry/coreapi.py: ❏ insert_data_to_database_sync (async to sync,Redis 内存放大太厉害,因为 cache 原因)
  53. 53. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Insert Queue ● insert_data_to_database - cache ● preprocess_event - queue ● save_event - queue ● ● insert_data_to_database_sync - queue
  54. 54. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 部署情况 ● HTTP x 4 (默认 Sentry 是用 Gunicorn 管理 Worker 的) ● UDP x 4 (开启了 Gevent,收到包后,扔到队列) ● Celery x 4 (Task consumer, 默认是开启 CPU_NUM 个 Worker) ● Celery Beat x 1 ● Cron:cleanup 21 (只保留 21 天的数据) ● HTTP 前面用 LVS + Nginx 做负载均衡 ● UDP 用 DNS 做负载均衡
  55. 55. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 内部配置 ● LDAP ○ 我们用的用户帐号系统 ○ 配置一下 Sentry 即可 ● MAIL ○ 配置 Sentry 邮件服务器 ● IRC ○ sentry-irc ○ 因为我们使用了 ircbot,稍微改了一下这个插件代码
  56. 56. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Why UDP ● 快 ● 应用不需要关心 Sentry 服务是否正常 ● 即使 Sentry 出问题,也不影响应用 ● 可以观察系统 UDP 丢包情况,判断 UDP 服务是否正常
  57. 57. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Celery ● 芹菜 ● 还没有吃透
  58. 58. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! DBA ● Redis ○ Memory ○ QPS ○ CPU ○ Queue Size ● MySQL ○ QPS ■ update ■ insert ■ delete ■ select ○ thread
  59. 59. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry ● Statsd ○ celery worker cpu ○ udp worker cpu ○ http worker cpu ● App 内统计 ○ task 执行时间 ○ task 执行数量
  60. 60. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Received Packet (Sentry) ● udp worker ● 收包后记录 ● d = sock.recvfrom(self.BUF_SIZE) ● statsd.increment(STATSD_KEY_RECV)
  61. 61. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Received Packet (Kernel) ● UDP Server 收到的包数 (by diamond) ● ~ $ sudo /sbin/iptables -t filter -I INPUT -i lan -p udp --dport 4008 -j ACCEPT ● ~ $ sudo /sbin/iptables -L INPUT 1 -nvx ● 52308183 414772916064 ACCEPT udp -- lan * 0.0.0.0/0 0.0.0.0/0 udp dpt:4008 ● pkts (UDP 完整包数,底层已经处理了分包问题) ● bts
  62. 62. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Dropped Packets (Kernel) ● cat /proc/net/udp ● sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops ● 41: 00000000:80CE 00000000:0000 07 00000000:00000000 00:00000000 00000000 6561 0 4110825944 2 ffff8809c23e5e40 0 ● cat /proc/net/snmp ● Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● Udp: 5416706536 993028 290598311 22725578190 4662160 1318 ● UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● UdpLite: 0 0 0 0 0 0
  63. 63. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! TIPS ● Webhooks: ○ 默认禁止访问内网 IP, 需要更改一下配置 ● Timezone ○ SENTRY_DEFAULT_TIME_ZONE = 'Asia/Shanghai' 设置用户默认时区 ● Public ○ SENTRY_PUBLIC = False 这个权限有点问题,不要开启 ● Register ○ SENTRY_FEATURES['auth:register'] = False 禁止自己注册
  64. 64. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 下一步计划 ● 项目错误统计(QPS,Sentry 提供的图还不能满足需求) ● Profile 工具,有助于分析 Worker 瓶颈 (Celery) ● 应对雪崩式错误的处理方案(压测 Sentry) ● 尝试一下 MySQL + Redis + Gevent
  65. 65. Jobs ● 2016 校招 ● always 社招 ● TO: ruby@douban.com ● 当然 python 也可以 ● TO: python@douban.com ● 如果你想试试 js 也可以尝试 ● TO: js@douban.com ● 详情: http://jobs.douban.com
  66. 66. 北京/上海/广州 0xFF Life's pathetic, go Pythonic!

×