Ying’s Blog ·

DolphinScheduler笔记之9: 容错

💡 原文中文，约5300字，阅读约需13分钟。

📝

内容提要

介绍了分布式调度系统 DolphinScheduler 的容错机制，采用 Master-Worker 的设计，通过 ZooKeeper 实现容错处理，但是 failover 需要重启任务。

🎯

关键要点

分布式系统需要解决单机不稳定的问题，如宕机、掉盘和网络抖动。
DolphinScheduler 采用 Master-Worker 设计，Master 之间互为主备，具备容错能力。
容错机制需确保当 Master 或 Worker 挂掉时，能够有效接管和恢复工作流。
容错通过 ZooKeeper 实现，提供服务发现、负载均衡和容错能力。
Master 和 Worker 启动时注册到 ZooKeeper，使用心跳机制更新状态。
Master 的容错通过抢锁机制实现，确保工作流实例只会被恢复一次。
Worker 的容错流程与 Master 类似，确保任务能够被有效恢复。
ZooKeeper 和 Apache Curator 提供了分布式系统的基础设施，简化了容错实现。
DolphinScheduler 的容错处理需要重启任务，需考虑如何实现无感知重启。

🏷️

标签

DolphinScheduler Master-Worker ZooKeeper failover 容错机制

➡️

继续阅读

kubernetes-goat学习笔记
Tomcat通用回显学习笔记
bypass 学习笔记之绕安全狗bypass safedog
The FBI reportedly won’t investigate ICE anymore
According to the The New York Times, federal agents have been told that the F...
Henrietta Dombrovskaya: Prairie Postgres July Meetup: Proudly Sourced at Midwest!
On July 15, we hosted the second meetup at our new location, the Chicago Inno...
Spark 4.2 has a feature that could retire your vector database
Apache Spark 4.2 launched last week, and it signals an expansion of Spark’s d...