It looks like the lack of persistent storage for the federated activity queue is leading to instances running out of memory in a matter of hours. See my comment for more details.
Furthermore, this leads to data loss, since there is no other consistency mechanism. I think it might be a high priority issue, taking into account the current momentum behind growth of Lemmy…
I guess that works as an emergency measure. Persistent storage doesn’t affect whether the updates are processed in time, but it would act as a sort of swap to keep the memory usage manageable.
For scalability, perhaps, you could run dijkstra and route the updates using the shortest path to each federated node, in a multicast sort of way? That would make the updates scale in a
O(log(N))
way, provided that activity isn’t too centralised. It would also be great to run periodic “deep scrubs” between instances to sync up each other’s activities and provide actual eventual consistency. I guess that’s kind of a liberal interpretation of ActivityPub, but I think that’s the only way to ensure real scalability.