Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

mgdigital@lemmy.world · 1 year ago

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

droopy4096@lemmy.ca · 1 year ago

@mgdigital, first thing I'be noticed: reliance on "heavier" database stack (pg + redis), at least from the first glance at docker-compose. My suggestion would be to have an option for minimalist setup with sqlite and without redis if possible. That would work better for those of us flying with minimal hardware (rpi, old PC and such).

mgdigital@lemmy.world · 1 year ago

Hi, this is a great point and one that I've already given consideration to. I'll address separately the issue of the primary datastore ,i.e. Postgres, and the Redis dependency:

Postgres as the only option for the data store

There are 2 reasons for this:

Performance: while SQLite could offer a simpler/embedded data store, it simply doesn't have the performance and features of Postgres. Bitmagnet has a faceted search engine and is write-intensive (it will be discovering ~5k torrents per hour and writing these to the database along with associated metadata). As such, its database may not be suitable for running on older hardware. A SQLite adapter, if it was developed, may simply not be up to the job (although as I haven't attempted this I can't say what the performance would be like). That said, Bitmagnet itself is not especially resource intensive, you could probably run it on a Raspberry PI but point it to a Postgres instance on some more powerful hardware. At this stage I've only been running it on a M2 Mac Mini with Postgres located on its SSD and so would be interested to know people's mileage on other hardware.
Development, support and maintenance overhead: I'm a lone developer and this project is already too big for one person. A SQLite adapter, if feasible performance-wise, I think could only happen if other contributors joined the project as my to-do list is already pretty long. It would have to achieve feature parity with the Postgres implementation which makes use of several Postgres-specific features and extensions. It would also mean a longer testing cycle and therefore probably a slower release cadence. That said, if there was enough demand and assistance then I'd be open to looking into the feasibility of this once the rest of the application is a little more mature and the current database schema more finalised.

Redis dependency

Redis is currently used only for the asynchronous task queue. I would like to have put this in Postgres, but there simply is not a good out-of-the-box solution that works well with Postgres and GoLang, and is actively maintained. I looked at quite a few queuing libraries and eventually settled on asynq (https://github.com/hibiken/asynq), which is a great library and does the job well - but could really do with support for non-Redis backends.

Using Redis here was a pragmatic decision that allowed me to make progress, rather than an optimal one. I guess I could have built a simple Postgres-based queue myself but that would have been a distraction and probably sub-optimal compared with a mature/separately developed library. It remains an option. Since I looked into this a new project has sprung up which I'm keeping an eye on - https://www.tork.run/ - it has a Postgres backend and looks like it might be up to the job, but is very new.

So yes, I'm very aware that the additional Redis dependency is not ideal and it may well disappear at some point.

mlunar@lemmy.world · 1 year ago

Hi, those points are certainly valid and I have nothing against these picks!

I just wanted to chime in that perf might not be as big of a problem as you might expect. 5k/hour is 1.4/sec, which sqlite should for sure be able to handle.

In fact, you can do hundreds to thousands of writes/sec, as long as you batch them in transactions (as by default each query is executed in its own transaction).

droopy4096@lemmy.ca · 1 year ago

thank you for such a detailed response. I would love to contribute however at the moment my capacities are rather limited but otherwise I'd be willing to add sqlite adapter. From your description it sounds like currently architecture is narrowly locked on PostgreSQL features. In my daily job I love PostgreSQL for big apps and stacks but I'm also aware how "hungry" PG can be, which is why I'm wondering whether it's "too big of a hammer" for this particular problem. Also, setting up single service is easier to novices vs maintaining several. Docker compose is nice but it has it's limitations.

Stephen304@lemmy.ml · 1 year ago

A dht crawler is inherently an intensive service to run, magnetico used sqlite and would take 10 minutes just to load the splash page that includes the total count of discovered torrents.

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

Home

What is a DHT crawler?

Currently implemented features of Bitmagnet:

Interested?

Postgres as the only option for the data store

Redis dependency