...
 
Commits (2)
  • Harshavardhana's avatar
    fix: lock maintenance should honor quorum (#9138) · c9212819
    Harshavardhana authored
    The staleness of a lock should be determined by
    the quorum number of entries returning stale,
    this allows for situations when locks are held
    when nodes are down - we don't accidentally
    clear locks unintentionally when they are valid
    and correct.
    
    Also lock maintenance should be run by all servers,
    not one server, stale locks need to be run outside
    the requirement for holding distributed locks.
    
    Thanks @klauspost for reproducing this issue
    c9212819
  • yeungc's avatar
    Some fix of chinese docs (#9140) · 7db902b2
    yeungc authored
    ## Description
    Clarify disk (硬盘) and node (节点).
    Remove the limit (限制) paragraph since there are no max 16 disks limit now.
    
    ## Motivation and Context
    
    
    ## How to test this PR?
    
    
    ## Types of changes
    - [x] Bug fix (non-breaking change which fixes an issue)
    - [ ] New feature (non-breaking change which adds functionality)
    - [ ] Breaking change (fix or feature that would cause existing functionality to change)
    
    ## Checklist:
    - [ ] Fixes a regression (If yes, please add `commit-id` or `PR #` here)
    - [ ] Documentation needed
    - [ ] Unit tests needed
    - [ ] Functional tests needed (If yes, add [mint](https://github.com/minio/mint) PR # here: )
    7db902b2
......@@ -225,8 +225,6 @@ func getLongLivedLocks(interval time.Duration) map[Endpoint][]nameLockRequesterI
return nlripMap
}
var lockMaintenanceTimeout = newDynamicTimeout(60*time.Second, time.Second)
// lockMaintenance loops over locks that have been active for some time and checks back
// with the original server whether it is still alive or not
//
......@@ -235,28 +233,19 @@ var lockMaintenanceTimeout = newDynamicTimeout(60*time.Second, time.Second)
// - some network error (and server is up normally)
//
// We will ignore the error, and we will retry later to get a resolve on this lock
func lockMaintenance(ctx context.Context, interval time.Duration, objAPI ObjectLayer) error {
// Lock to avoid concurrent lock maintenance loops
maintenanceLock := objAPI.NewNSLock(ctx, "system", "lock-maintenance-ops")
if err := maintenanceLock.GetLock(lockMaintenanceTimeout); err != nil {
return err
}
defer maintenanceLock.Unlock()
func lockMaintenance(ctx context.Context, interval time.Duration) error {
// Validate if long lived locks are indeed clean.
// Get list of long lived locks to check for staleness.
for lendpoint, nlrips := range getLongLivedLocks(interval) {
nlripsMap := make(map[string]int, len(nlrips))
for _, nlrip := range nlrips {
// Locks are only held on first zone, make sure that
// we only look for ownership of locks from endpoints
// on first zone.
for _, endpoint := range globalEndpoints[0].Endpoints {
if endpoint.String() == lendpoint.String() {
continue
}
c := newLockAPI(endpoint)
if !c.IsOnline() {
nlripsMap[nlrip.name]++
continue
}
......@@ -266,25 +255,37 @@ func lockMaintenance(ctx context.Context, interval time.Duration, objAPI ObjectL
UID: nlrip.lri.UID,
Resources: []string{nlrip.name},
})
if err != nil {
nlripsMap[nlrip.name]++
c.Close()
continue
}
// For successful response, verify if lock was indeed active or stale.
if expired {
// The lock is no longer active at server that originated
// the lock, attempt to remove the lock.
globalLockServers[lendpoint].mutex.Lock()
// Purge the stale entry if it exists.
globalLockServers[lendpoint].removeEntryIfExists(nlrip)
globalLockServers[lendpoint].mutex.Unlock()
if !expired {
nlripsMap[nlrip.name]++
}
// Close the connection regardless of the call response.
c.Close()
}
// Read locks we assume quorum for be N/2 success
quorum := globalXLSetDriveCount / 2
if nlrip.lri.Writer {
// For write locks we need N/2+1 success
quorum = globalXLSetDriveCount/2 + 1
}
// less than the quorum, we have locks expired.
if nlripsMap[nlrip.name] < quorum {
// The lock is no longer active at server that originated
// the lock, attempt to remove the lock.
globalLockServers[lendpoint].mutex.Lock()
// Purge the stale entry if it exists.
globalLockServers[lendpoint].removeEntryIfExists(nlrip)
globalLockServers[lendpoint].mutex.Unlock()
}
}
}
......@@ -293,12 +294,13 @@ func lockMaintenance(ctx context.Context, interval time.Duration, objAPI ObjectL
// Start lock maintenance from all lock servers.
func startLockMaintenance() {
var objAPI ObjectLayer
var ctx = context.Background()
// Wait until the object API is ready
// no need to start the lock maintenance
// if ObjectAPI is not initialized.
for {
objAPI = newObjectLayerWithoutSafeModeFn()
objAPI := newObjectLayerWithoutSafeModeFn()
if objAPI == nil {
time.Sleep(time.Second)
continue
......@@ -322,7 +324,7 @@ func startLockMaintenance() {
// "synchronous checks" between servers
duration := time.Duration(r.Float64() * float64(lockMaintenanceInterval))
time.Sleep(duration)
if err := lockMaintenance(ctx, lockValidityCheckInterval, objAPI); err != nil {
if err := lockMaintenance(ctx, lockValidityCheckInterval); err != nil {
// Sleep right after an error.
duration := time.Duration(r.Float64() * float64(lockMaintenanceInterval))
time.Sleep(duration)
......
......@@ -9,21 +9,17 @@
### 数据保护
分布式Minio采用 [erasure code](https://docs.min.io/cn/minio-erasure-code-quickstart-guide)来防范多个节点宕机和[位衰减`bit rot`](https://github.com/minio/minio/blob/master/docs/zh_CN/erasure/README.md#what-is-bit-rot-protection)
分布式Minio采用 [纠删码](https://docs.min.io/cn/minio-erasure-code-quickstart-guide)来防范多个节点宕机和[位衰减`bit rot`](https://github.com/minio/minio/blob/master/docs/zh_CN/erasure/README.md#what-is-bit-rot-protection)
分布式Minio至少需要4个节点,使用分布式Minio自动引入了纠删码功能。
分布式Minio至少需要4个硬盘,使用分布式Minio自动引入了纠删码功能。
### 高可用
单机Minio服务存在单点故障,相反,如果是一个N节点的分布式Minio,只要有N/2节点在线,你的数据就是安全的。不过你需要至少有N/2+1个节点 [Quorum](https://github.com/minio/dsync#lock-process) 来创建新的对象。
单机Minio服务存在单点故障,相反,如果是一个有N块硬盘的分布式Minio,只要有N/2硬盘在线,你的数据就是安全的。不过你需要至少有N/2+1个硬盘来创建新的对象。
例如,一个8节点的Minio集群,每个节点一块盘,就算4个节点宕机,这个集群仍然是可读的,不过你需要5个节点才能写数据。
例如,一个16节点的Minio集群,每个节点16块硬盘,就算8台服務器宕机,这个集群仍然是可读的,不过你需要9台服務器才能写数据。
### 限制
分布式Minio单租户存在最少4个盘最多16个盘的限制(受限于纠删码)。这种限制确保了Minio的简洁,同时仍拥有伸缩性。如果你需要搭建一个多租户环境,你可以轻松的使用编排工具(Kubernetes)来管理多个Minio实例。
注意,只要遵守分布式Minio的限制,你可以组合不同的节点和每个节点几块盘。比如,你可以使用2个节点,每个节点4块盘,也可以使用4个节点,每个节点两块盘,诸如此类。
注意,只要遵守分布式Minio的限制,你可以组合不同的节点和每个节点几块硬盘。比如,你可以使用2个节点,每个节点4块硬盘,也可以使用4个节点,每个节点两块硬盘,诸如此类。
### 一致性
......