Follow

I currently have 4 servers at Hetzner in 2 different datacenters, 2 at OVH, and 1 at Init7. But it seems the latency really kills ceph performance. So I'm wondering if it would be a good idea to move the rest of the servers to Hetzner as well and sacrifice multihoming. Any thoughts?

(this is all privately financed so I'd rather not rent an additional bunch of servers to run tests)

@lexi - maybe GlusterFS could be an alternative to Ceph?

In Georeplication GlusterFS should be async so that local instance doesn't wait for the remote nodes to finish writes.

@ij my biggest complaint is with the read performance though. I'm not sure if I've exhausted all tuning opportunities yet, but reading from e.g. rados or rbd seems to incur high latencies and produce tiny reads, which is a terrible combination

@lexi - you should be able to increase read performance as well. At least that's what GlusterFS claims to be good for.

I'm using it for sharing/syncen LetsEncrypt SSL Certs between all of my VMs. Small task with no real performance needs, but I also thought about syncing TBs of data across servers (partly behind DSL).

@lexi
are those Hetzner hosts physical machines or VMs? if VMs you could test the connection between the different Hetzner locations without any long term commitment

@tsia I only have physical machines. (You can get them pretty cheap through their serverboerse)

@lexi
hm. would it be feasible to just get a couple VMs for a few hours just to test? I don't know what locations are available for SB Machines though.

@tsia I don't know how useful VMs would be since my past experiences with Hetzner VMs were pretty much that their disk I/O is unusable (heavily overbooked), so I suppose the result would be randomly skewed

@lexi
have you had any experience with the new hetzner.cloud or are you talking about the "old" VMs? they seem to be using different (better maybe?) technology than before. and it's paid by the hour.

@lexi sounds like a good idea to move them to only one location. Ceph needs to write the data to all redundant OSDs and WAL/DB before it is acknowledging the write to an upper layer. So with the latency this kills the performance as you observed it.

@leah sure, for writes I would agree (I'm not writing that much though), I'm just confused about the read performance. Essentially, individual reads are taking >25ms even though there's pretty much a guarantee that there is a local copy of the data, and each one only gives me a few hundred B even if I request 1MB or more, so they add up quickly.

@leah yes, one monitor, manager and metadata server each.

Sign in to participate in the conversation
ACP 🎉🍰

Anticapitalist Mastodon instance. Party means fun, not political party. But we're still political.