Failure Models
本記事はdev.toから移植しました.
Introduction
The previous article explained the communication model for the timing of consensus problem. In this article, we will introduce four general definitions of failure models for consensus problem nodes.
Crash-stop faults
This model only places the assumption that the node will Crash-stop faults. Also, a node that is stopped in this model never comes back.
Omission faults
This model assumes Crash-stop faults and Omission faults. Omission faults may or may not reply to messages. This ignoring cannot be judged by other nodes as to whether it is Crash-stop faults or Omission faults.
Crash-recovery faults
This model assumes Crash-stop faults, Omission faults, and Crash-recovery faults. The crash recovery fault makes the assumption that a node may crash at any time and may begin to re-intervene with the response at any time. The model also assumes partial data loss due to crash. If there is no response from a node in this model, it is not possible to determine whether it is Crash-stop faults, Omission faults, or Crash-recovery faults.
Byzantine faults
This model assumes Crash-stop faults, Omission faults, Crash-recovery faults, and Byzantine faults. Byzantine breakdowns can do whatever the node does. For example, it can ignore messages, it can pretend to be malfunctioning, can reply to fake messages, or it can do evil deeds jointly among multiple Byzantine failure nodes.
Conclusion
The difficulty of dealing with each failure model is illustrated in the figure below.
In other words, the Byzantine failure model is the most difficult assumption to deal with.