Build a Database in Four Months with Rust and 647 Open-Source Dependencies

130 points 148 comments 8 months ago

moi2388

“ With a team of three experienced developers, we have implemented ScopeDB from scratch”

“ with 100 direct dependencies and 647 dependencies in total”

Next up: watch me build numpy from scratch with only 150 dependencies, one of which numpy.

remram

You're not wrong, they depend on an external SQL database, which they access with sqlx.

tison

In the linked article below, we talked about "If RDS has already been used, why is another database needed?" and "Why RDS?"

Briefly, you need to manage metadata for the database. You can write your own raft based solution or leverage existing software like etcd or zookeeper that may not "a relational database". Now you need to deploy them with EBS and reimplement data replication + multi AZ fault tolerance, and it's likely still worse performance than RDS because first-class RDS can typically use internal storage API and advanced hardware. Such a scenario is not software driven.

https://flex-ninja.medium.com/from-shared-nothing-to-shared-...

stuhood

When it comes to understanding the risks involved with having this many dependencies, one thing that folks might not understand is that Rust's support for dependency resolution and lock files is fantastic.

Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them (but you have to run it!). And Github's https://github.com/dependabot/ will do that same thing automatically, just based on the existence of the lockfile in your repo (and will also open PRs to bump deps for you).

And as mentioned elsewhere: Cargo's dependency resolver supports providing multiple versions of a dep in different dependency subgraphs, which all but eliminates the "dependency hell" that folks expect from ecosystems like Python or the JVM. Two copies of a dep at different versions? Totally fine.

Threadbare

Doesn't node npm also do similar?

stuhood

Yes. AFAIK, it evolved over time across 3+ package managers (`npm`, `yarn`, `pnpm`, etc), but the current state of that ecosystem is similar (including the behavior of dependabot).

robertlagrant

Python's Poetry has poetry audit as well, and there are third-party tools such as Safety (Python), Nancy (Golang), etc. Lots of languages have something like this.

stuhood

They support lockfiles and tools like `audit`, yes. But they do not support having multiple versions of a dependency.

Tools based on loading libraries from a *PATH (Go, Python, JVM) usually do so by grabbing the first one that they encounter that contains the appropriate symbols. That is incompatible with having multiple versions of a package.

On the other hand, Rust and node.js support this -- each in their own way. In Rust, artifact names are transparently suffixed with a hash to prevent collisions. And in node.js, almost all symbol lookups are accomplished with relative filesystem paths.

robertlagrant

True!

hulitu

> Tools like `cargo audit` can tell you statically based on the lockfile which dependencies have security vulnerabilities reported against them

known security vulnerabilities. If someone compromises your cargo repository (see npm for examples) all your safety is gone.

binaryturtle

Isn't that something that should be posted April 1? I'm really not sure if the author is proud about the fact that his project has so many dependencies. Is that something modern coders aim for these days? I usually try to achieve the exact opposite in my projects.

griomnib

April 20th as you’d have to be high as hell to think this was a good idea.

hulitu

> Is that something modern coders aim for these days?

Yes. No dependencies is so 80's. Just run an ldd on your commonly used programs.

bdcravens

Even developers with "few" dependencies often lean on projects (languages, frameworks, etc) where there are hundreds of dependencies.

joquarky

I also prefer to minimize dependencies, and it feels like this is why I can't find work.

johnisgood

So do I. I am writing a Perl script right now, and I could either use a non-core dependency, or implement my own. I went with my own. It is only a few lines of code. It works without the need to cpan i the module.

ramon156

Its just really tongue-in-cheek about everything which makes this article more fun to read imo

thadt

"An absolutely outrageous number of dependencies! What a bunch of wankers."

I comment, in a Chromium[1] tab, running on my Ubuntu[2] box.

[1] https://github.com/chromium/chromium/blob/main/.gitmodules

[2] https://releases.ubuntu.com/24.04/ubuntu-24.04.1-desktop-amd...

mlok

Related : "Build a Database in 3000 Lines with 0 Dependencies" https://news.ycombinator.com/item?id=42725163

ergonaught

While acknowledging one does not "have to" have so many dependencies, the prevalence of this npm-esque type of practice is one of the two things that destroyed all of my interest in Rust.

sealeck

Rust dependencies tend to be pretty high quality in my experience. Maintained by experts and offer new improvements over state-of-the-art.

But if you compare to C/C++ at least with Rust you _can_ but aren't required to use dependencies. In C/C++ if you want to, it's a _massive_ pain.

rectang

I care less about the quality of the dependencies than about the burden of protecting against supply chain attacks when there are a lot of dependencies.

humanfromearth9

In the past, I worked on a project for Luxembourg's CTIE (their IT administration). In most cases, they explicitly requested that we reimplemented features we needed instead of including more third party libraries. They just allowed essential libraries for the project, like Struts for the Web framework, or implementations of standard libraries like JPA, JTA etc. that came with WebSphere. Everything else, we had to reimplement. For them, it was just much easier to manage, given the amount of systems they have to manage. And the allowed libraries were only allowed in versions that they had reviewed before for security issues. In the end, reimplementing features/functions that we could have included with other libraries was never a reason for any problem : this practice requires some additional work, but has never been significant for the ability to deliver the project as expected.

BodyCulture

How many people do the security code review with this process? How do they avoid piling dozens of well hidden holes when you not use a library that is publicly available and seen by thousands of eyes?

Isn’t the best argument for open source code that it has so many people, most companies can not afford such a global quality assurance.

kibwen

Indeed, and that's a good reason to avoid third-party dependencies. But that's irrelevant to the choice of programming language; a language with a bad dependency manager might force you to build everything yourself, but you can always just do that, even in a language with a good dependency manager, you just choose to build everything yourself if you care.

Perplexingly, the original commenter seems to understand that this doesn't matter, and then handwaves away the correct conclusion.

rectang

It remains relevant to programming language choice because the "best in class" libraries in Rust often have lots of dependencies, thanks to Rust culture and cargo's design.

I'd like to be able to pick a few libraries without incurring a huge ongoing audit burden. If I have to exclude many popular libraries because they have oodles of dependencies, that both makes searching more laborious and limits my choices.

kibwen

I still don't understand what alternative people are arguing in favor of. When I think of those "best in class" libraries like regex, serde, etc, those are multiple crates that are developed by the same teams. Having one massive crate or one hundred tiny crates is irrelevant here, because if they're all developed by the same contributors it does not increase your trusted computing base.

estebank

I do think that there's some work that can be done to improve reporting on our side: cargo should be able to report not just "how many crates are in the dep tree" but rather "how many owners am I depending on" and "how many repositories am I depending on". For example, I just noticed that there's no way to see in crates.io other crates that live in the same repository, like it does per owner, even though it has that information available.

claytonwramsey

Cargo currently has `cargo tree`, which prints out a dependency tree. There's an extension to cargo which also shows how many people have the ability to push to your dependencies, titled `cargo-supply-chain`.

https://github.com/rust-secure-code/cargo-supply-chain/

nicoburns

Yeh, this would be great. I'd also love to see the ability to publish multiple library crates as a single package.

rectang

> "how many owners am I depending on"

Yes, knowing that would be helpful!

Is there a way to whitelist owners/publishers in Cargo?

0x457

There is `cargo-deny` that handles some enforcement: https://github.com/EmbarkStudios/cargo-deny. Doesn't handle authors, but I suspect it's easy to add?

There is really just a handful of crates that nearly often get pulled in and probably like 5 authors across them.

Supply chain harderning is pretty easy in rust: caro-deny, cargo-suply-chain, cargo-crev, cargo-vet, cargo-{s}bom and probably a few more I can't remember.

estebank

No tool for that exists afaik, but all the pieces to make it are there.

ncruces

I'd settle for most dependencies not having any dependencies at all; at a minimum making a serious effort to only add dependencies that really pull their own weight.

This starts from explaining outright which dependencies they have and why.

It's not so much direct dependencies that bother me: it's an exponential explosion of transitive dependencies.

Also, seeing an “end product” with dozens of dependencies doesn't bother me much; a library does.

FridgeSeal

Is each web framework expected to reimplement regex, path matching and HTTP protocol logic themselves?

Should every physics/ML/etc library have to write their own array abstractions and operators? If every networking library had to write and ship their own async executor logic, I can safely posit that would be about 100x worse than it is now.

The alternative to “zero dependencies” isn’t full “JS/NPM-dumpster-fire”.

ViewTrick1002

The best in class libraries depend on many crates. But crates are often used in workspaces to speed up compilation or split up independent parts.

So how many dependencies are there truly when you peel away the first layer of the onion?

https://doc.rust-lang.org/cargo/reference/workspaces.html

rectang

I dunno! Sounds complicated.

The obvious answer is "N crates is N dependencies", because each crate represents a discrete sequence of atomic software release packages.

In the absence of a standardized mechanism to group crates together, we have to fall back to informal methods, like "I know all these authors personally because I'm an insider", or "these crates seem to be related even though I'm unsure how to guarantee they'll stay that way".

You can take a hard line and insist that nobody should run a single line of code they haven't reviewed, but that severely constrains the ability of a typical org to use the wider ecosystem at all. Not every org has the expertise on staff to pore over diverse Rust code and confidently state that it has no issues, and even those that do have to consider whether paying that cost is good risk management.

It would be nice if there was a more reliable way to simplify the evaluation of publisher trust centers, especially for orgs who aren't going to audit code but don't want to blindly take in anything.

infogulch

What is the shape of these dependency trees? Is it really hundreds of single-type + single-function crates? Could there ever be a path to scrub out the smaller dependencies and integrate them into larger crates with more concrete functionality?

What's the status of potential distributed code review systems like cargo-crev?

kibwen

> Is it really hundreds of single-type + single-function crates?

No, and I think this is the crucial thing that people who have experience with NPM overlook when it comes to Rust. Rust emphatically does not have a culture of single-function microlibraries, instead libraries are split out by purpose, in the same way you would modularize a C codebase.

Remember, Rust crates are not just units of distribution, they are also units of translation (a.k.a. compilation units), so the same pressures that cause people to split C projects into multiple files results in people splitting Rust projects into multiple crates.

infogulch

That makes sense. It seems this "problem" is unlikely to be "solved" by reintegrating projects into larger crates, for good technical and social reasons. Then the solution is to reframe the problem.

Distributed code review is a brute-force style solution. Republishing collections of crates under a single name/version is a dimensionality-reduction and responsibility-concentration style solution. I suspect pure-PR style solutions will be ineffective. What other kind of solutions are there?

[deleted]

ironhaven

Well if you look at the most recent open source supply chain attack on openssh, that used social engineering to add a backdoor to a project that openssh did not have a dependency on anywhere in it's SBOM. And with the xz example the backdoor had to be rushed out when it was deployed because the dynamic dependencey was being removed before the backdoor was completely in place. Doing a open source supply chain attack is not easy, fast or reliable for long.

It is not as simple as you say. Sometime it is better to know all of you dependencies are static linked at build time and specified when you are releasing your code. And the more sane you build system is the harder it is to add shellcode to your dependency's tarball and build scripts without turning peoples heads with random unsafe code.

wakawaka28

>And with the xz example the backdoor had to be rushed out when it was deployed because the dynamic dependencey was being removed before the backdoor was completely in place. Doing a open source supply chain attack is not easy, fast or reliable for long.

If the xz backdoor had not been found due to dumb luck, it could have persisted for a long time. Backdoors have persisted for years before, maybe even decades. It's also a package with a lot of eyes on it compared to obscure packages. So I don't think you're right even a little bit, especially in huge projects or projects with LOTS of dependencies.

ryanisnan

I don't think any of your points detract from the original argument. Having more dependencies just widens the attack surface area, and makes an attack like this easier, depending on the motivation and resources of an MCA.

goodpoint

> Doing a open source supply chain attack is not easy

When projects ship 600 dependencies it's really easy.

> not easy, fast or reliable for long.

It does not need to be long. One day is enough to compromise a system or a thousand.

whodev

Thank you.

As someone who works in cybersecurity and works closely with our developers, a lot of them tend to inherently trust third-party code with no auditing of the supply chain. I am always fighting that while yes, we don't need to reinvent the wheel and libraries/packages are important, our organzation and developers need to be aware of what we are bringing into our network and our codebase.

kibwen

As someone who also works in cybersecurity, we use Rust extensively and are sunsetting all of our C code. We use third-party dependencies judiciously and never deploy anything without auditing it. It's great that Rust facilitates this convenient ecosystem, and it's to Rust's benefit, not Rust's detriment, that the ecosystem exists.

SleepyMyroslav

Many orgs want to save costs while using open software. In a world after xz incident they are now in very difficult place. It is all about whom do one trusts. As someone who works in such costs aware business as gamedev I think Rust has unique chance to capture trust market. If major Rust organization sponsors will donate information about even a fraction of what they had audited that can create safe heaven for smaller orgs and startups. Even large orgs that still have history of audits for decades and long list of C/C++/C#/Java projects they trust might buy in. Because it is not reasonable to expect that they can keep up with each open project updates.

whodev

> We use third-party dependencies judiciously and never deploy anything without auditing it.

This is how I think it should be of course. Like I said, I'm not against the use of third-party code or dependencies, I'm against using them without performing any audit of that code.

larusso

Nothing stops you from vendoring them into your repo and hand update each. But how would you do this in c++? Write everything from scratch? I mean rust doesn’t stopp you there

[edit] typos

0x457

Well, no one is forcing you to use these dependencies? Rust crates tend to be very minimal because how easy it is to use them.

The amount of code you have to review stays the same.

cchance

So ... then don't use them? No one forces anyone to use any dependencies in rust lol its just faster to use shit thats already made

goodpoint

One malicious dependency is enough. When you have 600 dependencies "tend to be pretty high quality" does not cut it.

orf

it's completely stupid to measure "number of dependencies" in absolute numbers.

Lots of packages have a `-macros` or `-derive` transient dependency, meaning a single dependency can end up coutning as 3 additional dependencies.

Rust makes it simple to split packages into workspaces - for example, regex[1] consists of `regex-automata` and `regex-syntax` packages.

This composition and separation of concerns is a sign of good design, and not an npm-esque hellhole.

1. https://crates.io/crates/regex/1.11.1/dependencies

rectang

I suppose you could say that the audit burden scales linearly with the number of module publishers, with a small additional amount on every release point to confirm that the publisher is still who they purport to be and hasn't been compromised.

This is assuming that the audit consists of validating dependency authorship, and not the more labor-intensive approach of reviewing dependency code.

0x457

Hard no. Burden scales with number of lines. Lines being split into smaller chunks (crates) only speed up the process in long run.

orf

Hard yes, burden scales with number of authors and not number of lines.

That’s… the whole rationale about not liking lots of small packages.

0x457

Are you reviewing code you're pulling into your code base (that is usually organized and counted in lines, smartass) or authors?

Either way, with rust it's a handful of authors, but just because they are proven to be good faith actors, doesn't mean trust in their code is implied when we're talking about supply chain hardening.

rectang

From upthread:

> This is assuming that the audit consists of validating dependency authorship, and not the more labor-intensive approach of reviewing dependency code.

So, obviously: authors.

I took your reply of "hard no" to be a rejection of validating authors as sufficient hardening and an assertion that only line-by-line code review meets your standards. Fine, but if your answer is always going to be "doesn't matter, not good enough", we can't have a reasonable conversation about how best to validate authors.

0x457

No, line-by-line meets my standard. I don't think just validating authorship is enough.

rcxdude

That depends on whether you want to vet the authors or the code itself.

orf

Sure, but then we could just take all the dependency code and put it in single line to make it quicker to review.

rcxdude

Indeed. It's actually been quite handy on a few occasions to be able to just pull in a smaller crate as opposed to the whole project. (in constrast to, say, boost in C++, which is a big mess of a dependency even though it's one that goes to at least a little bit of effort to let you split it up, but through an ad-hoc process as opposed to a standard package management system).

(I would genuinely be interesting in an experiment which pushes this as far as possible: what if each function was a 'package'? There's already a decent understanding of how dependencies within a library work in the compiler, what if that extended to the package manager? You would know exactly what code you actually needed, and would only pull in exactly what was necessary)

arccy

that's kind of on rust for pushing crates front and center rather than groupings of crates that are developed / reviewed / released together as a single cohesive unit (typically a git repo).

e.g. go dependencies are counted on modules (roughly git repos), rather than packages (directories, compilation units). java is counted in packages rather than classes.

adamc

The vulnerability to supply chain attacks gives me pause. It's not unique to rust and it bothers me with npm or Python as well.

PittleyDunkin

What are you comparing this to? Do you have positive examples? This seems to be a general dependancy management issue unrelated to rust—the reason C++ has this is that C++ also lacks any concept of dependencies, so people kind of just make do with modifying what packages are already integrated into the build process. This certainly doesn't imply you should trust boost (or the standard library, or whatever people use this decade, or xz, or whatever).

kibwen

How many transitive dependencies is the right number for a database?

jandrewrogers

Honestly, current best practice puts that number right around zero, which you see for ambitious implementations.

A non-obvious issue is that database engines have peculiar requirements for how libraries are designed and implemented which almost no conventional library satisfies. To make matters worse, two different database implementations may have different requirements in this regard, so you can't even share libraries between databases. There are no black boxes in good database engines.

almostdeadguy

Compression libraries, OpenSSL, ICU, etc. are all common dependencies for databases.

Looking at the dependencies list (https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...) I see plenty of reasonable things like:

* Base64/checksum/compression encoding libraries

* Encryption/hash libraries

* Platform-specific bindings (likely conditional dependencies)

* Bit hacking/casting/zero-copy libraries like bytemuck, zerocopy, zero-vec, etc.

* "Small"/stack allocated data structure libraries (smallvec, tinystr, etc.)

* Unicode libraries

There are certainly things that would add bloat too, but I think it's silly to pretend like everything here is something a database engine would need custom implementations of.

jandrewrogers

I think you'd be surprised how many of these things are custom implementations in databases. The main motivation is performance. Databases tend to have detailed and well-specified constraints on each use case for data structures and algorithms that can be used to codegen narrowly optimized implementations. You can do significantly better than generic library codecs or data structures in most cases, those implementations lack the context and metaprogramming hooks to make it feasible.

Combine this with the challenge of implementations being async, non-allocating, compatible with explicitly paged memory, etc and it generally becomes worth the effort.

You'll find more libraries used at the periphery for integration and compatibility where it matters less but not in the core.

0x457

Pretty sure it's not due to performance, but due to age of most database code bases and in some cases licensing. How annoying it is to have dependencies in C and C++ also probably a contributing factor.

I'd rather an author pulls in a tinyvec/serde than tries to make a bespoke implementation.

jandrewrogers

You are entitled to your take, but it is a demonstration that you’ve never been anywhere near a modern database engine code base. You can’t defend your opinion. It is what people with no experience would say.

That’s fine, I have not objection to hot takes, but don’t conflate your limited experience with reality.

0x457

Never claimed that I was an expert or anything. I interpolated my opinion from my experience with large C code bases.

I get it that you work on a geospatial database and understand this field better than I do, but why so rude and dismissive? Work on that.

kibwen

> current best practice puts that number right around zero

In the case where the answer is "zero", then that means that one does not actually need a package manager at all, in which case the features of the package manager are not relevant to the choice of language. This would imply that the parent commenter has no need to reject Rust.

brabel

Just tried to look at what some macro was generating using cargo-expand. It requires a LOT of dependencies. Took like 5 minutes to compile it all (run `cargo install cargo-expand` if you want to try). I almost aborted because the description of the crate says "Wrapper around rustc -Zunpretty=expanded." so I had expected the simplest possible crate to do that.

PittleyDunkin

> Took like 5 minutes to compile it all

TBF this has nothing to do with dependency complexity and everything to do with semantic complexity. You could easily do this without using any dependencies at all.

unless you're downloading dependencies during the build or something like that, of course.

klysm

This take is utter nonsense to me - just don't use them...

[deleted]

FullGarden_S

imagine if one dependency is GPL lol

With over 600 dependencies, the probability goes up and up.

Deukhoofd

I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"

bbkane

That's technically correct, but they listed several ways they contribute back to the OSS ecosystem: PRs, issues, creating new libraries...

This comment makes it seem like all this company does is take, which feels unfair to me

ipaddr

"We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable"

They say they do when suitable (never or rarely).

But that's fine as the licenses allow it. It feels like another company blogging about how great open source to get pr while close sourcing their product.

The older I get the more I understand why gpl variations are superior to bsd if you want to grow the software. Bsd are good for throw away code or standards you want others to adopt.

PittleyDunkin

>This comment makes it seem like all this company does is take, which feels unfair to me

Profit isn't far removed from theft, so maybe this shouldn't feel so unfair.

bbkane

> Profit isn't far removed from theft

I definitely think there are unethical ways to profit - capitalism needs to be regulated for the good of the consumer/ecosystem/society.

However, I don't believe that a blanket comparison of any type of profit to theft can be useful or correct.

> so maybe this shouldn't feel so unfair

Do you think this company is unethical for writing closed source software and trying to sell it?

tison

And contributing back is one of the approaches to maintaining open-source dependencies. I have described how to deal with OSS dependencies in [1] (yet to translate it :P).

[1] https://www.tisonkun.org/2024/11/17/open-source-supply-chain...

tison

This article is actually a translated one. In the original article[1], I talked about commercial open-source and how one can collaborate with the open-source community when running a software business.

This section is moved to the second-to-last section in the posted blog, including:

[QUOTE]

When you read The Cathedral & the Bazaar, for its Chapter 4, The Magic Cauldron, it writes:

> … the only rational reasons you might want them to be closed is if you want to sell the package to other people, or deny its use to competitors. [“Reasons for Closing Source”]

> Open source makes it rather difficult to capture direct sale value from software. [“Why Sale Value is Problematic”]

While the article focuses on when open-source is a good choice, these sentences imply that it’s reasonable to keep your commercial software private and proprietary.

We follow it and run a business to sustain the engineering effort. We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable, and maintain the open-source twin to share the engineering experience.

[QUOTE END]

I wrote other blogs to analyze open-source factors within commercial software[2][3][4][5], and I have practiced them in several companies as well as earned merits in open-source projects.

When you think about it, there are many developers working for their employers, and using open-source software in their $DAYJOB is a good motivation to contribute more (especially for distributed systems; individuals can seldomly need one). I know there is open-source developers who develop software that has nothing to do with their $DAYJOB. I'm maintaining projects that has nothing to do with my $DAYJOB also (check Apache Curator, the Java binding of Apache OpenDAL, and more).

[1] https://www.tisonkun.org/2025/01/15/open-source-twin/

(Need a translator) [2] https://www.tisonkun.org/2022/10/04/bait-and-switch-fauxpen-...

[3] https://www.tisonkun.org/2023/08/12/bsl/

[4] https://www.tisonkun.org/2022/12/17/enterprise-choose-a-soft...

[5] https://www.tisonkun.org/2023/02/15/business-source-license/

easterncalculus

From the title I was really expecting this page to be a tutorial like build-your-own[1].

[1]: https://build-your-own.org/database/

01HNNWZ0MV43FF

That's why all my useless little crates are AGPL :D

bdcravens

Isn't that pretty much the modern stack? Open source language, framework, and libraries, and proprietary end product?

goodpoint

And then tech companies fire engineers while making record profits.

PittleyDunkin

> I really chuckled about how the blog post opens with how great Rusts open-source ecosystem is, and ends with an "anyway, we made our software private and proprietary"

I mean that's been the prevalent attitude for the entire history of open source. Its easy to laugh until someone replaces you.

dabinat

I was hoping this would be a discussion of Rust build times and how they optimized them with that number of dependencies.

But I think it’s easy for people to criticize dependencies from afar without understanding what they’re used for. I’m sure the dependencies in my projects would look strange to others - for example, I use three HTTP libraries: one for 95% of cases and the others for very specific use-cases where I need control at a low level. But without that context it might seem excessive.

flufluflufluffy

I read the title thinking it was a joke, and after reading the article, I still can’t tell if it is or not.

etaioinshrdlu

My main question is why observability data needs (or benefits from) a tailor-made database instead of a general purpose one. In 2025, anyone working on observability who told me they have to build their own database, I would be very suspicious!

tison

Datadog always builds their own event store: https://www.datadoghq.com/blog/engineering/introducing-husky...

It may not be named "database" but actually take the place of a database.

Observability vendors will try to store logs with ElasticSearch and later find it over expensive and has weak support for archiving cold data. Data Warehouse solution requires a complex ETL pipeline and can be awkward when handling log data (semi-structured data).

That said, if you're building an observability solution for a single company, I'd totally agree to start with single node PG with backup, and only consider other solution when data and query workload grow.

jcgrillo

In 2025 I'd consider starting with clickhouse instead, if you're going the DIY route

Jolter

Not even limited to general purpose ones, there are existing tailor made databases for observability. Maybe somewhere on that page, they explain why this one is better.

EVa5I7bHFq9mnYK

57 of which written by DPRK Koding Forces, waiting for the right moment to push a glorious update, striking at the heart of The Biggest Enemy.

robertclaus

I'm having a real crisis trying to decide whether this system should be called a database or not. It's a system for managing data, so obviously it is.. but by that loose interpretation any CRUD webserver would count too.

estebank

Yet another thread where people go "Dependency number too big! Rust bad!" with the level of nuance of my dogs discussing dinner.

The full list is linked in the article https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...

There isn't a single thing there that seems iffy to me. Rust projects split themselves into as small of a crate as possible to 1) ease their own development, 2) improve compile times to make their compilation trivially parallelizable, and 3) allow for reuse. Because of this, you can easily end up with a dozen crates all written by the same group of people, meant to be used together. If a project is a single big crate, or a dozen small crates, you're on the exact same situation. If you wouldn't audit the small crates because they are a lot, you wouldn't audit the big crate thoroughly either.

But what about transitive dependencies? Similar thing: if you have a crate to check for the terminal width, I prefer to take the existing small crate than copy paste its code. I can do the latter, but then you end up with effectively a vendored library in your code that no tool can know about to warn you when a security vulnerability has happened.

kouteiheika

> There isn't a single thing there that seems iffy to me.

You mean like four versions of hashbrown (which is useful, but it's rare to have to use it directly instead of `std::collections::HashMap`, never mind pulling four versions of it into your project) or four versions of itertools (which is extremely situational, and even when it is useful it usually only saves you a couple of lines of code, so it's essentially never worth pulling it once, never mind four times)? Or maybe three different crates for random number generation (rand, nanorand, fastrand)?

There's a definitely problem with how the Rust community approaches dependencies (and I say this as someone who loves Rust and uses it as their main language for 10+ years now). People are just way too trigger happy with external dependencies, and burying our heads in the sand is not helping.

Inclusion of every external dependency should always be well motivated. How big is the dependency? How much of it do we use? How big of an effect will it have on compile times? How much effort would it be to write it yourself? Is it security sensitive? Is it a dependency which everyone uses and is maintained by well known community members, or some random guy from who knows where? And so on.

For example, cryptography stuff? No, don't write that yourself if you're not an expert; you'll get it wrong and expose yourself to vulnerabilities. Removing leading whitespace from strings? ("unindent" crate, which is also on your list) Hell no! That's like a minute or two to write this yourself. Did we learn nothing from the left-pad incident?

estebank

> You mean like four versions...

The two options for cargo here are 1) fail to compile when there's more than one crate-version in the dep tree or 2) allow for there to be more than one and let the project continue compiling. The former would be more "principled" but in practice incredibly disruptive. I usually go "dep hunting" to unify the versions of duplicated deps. Most of the time that's just looking at `cargo tree` and modifying the `Cargo.toml` slightly. Other times it's not easy, and have to either patch or (better) wait until the diverging dep updates their own `Cargo.toml`.

> People are just way too trigger happy with external dependencies, and burying our heads in the sand is not helping.

> Inclusion of every external dependency should always be well motivated. How big is the dependency? How much of it do we use? How big of an effect will it have on compile times? How much effort would it be to write it yourself? Is it security sensitive? Is it a dependency which everyone uses and is maintained by well known community members, or some random guy from who knows where? And so on.

We can have a nuanced discussion about dependencies. That's not what I was seeing. There are plenty of things that can be done to improve the situation, specially around Supply Chain Security, but this idea that dependency count is the issue is misguided. It pushes projects towards copy-pasting and vendoring. That makes that code opaque to security tools, existing or proposed. Think of the shitshow it is if you have an app and decided "more dependencies is bad, so I'm copying xz into my repo"?

> Removing leading whitespace from strings? ("unindent" crate, which is also on your list) Hell no! That's like a minute or two to write this yourself.

I don't have access to the closed-source repo to run `cargo tree` to see where `unindent` is used from, but why do you feel this is an invalid crate to pull in? It is a proc-macro, that deindents string literals at compile time. Would I include it directly in a project of mine? Likely not, but if I were using `indoc` (written by dtolnay), which uses `unindent` (written by dtolnay) my reaction wouldn't be "oh, no! An additional useless dependency!".

rectang

> I don't have access to the closed-source repo to run `cargo tree` to see where `unindent` is used from, but why do you feel this is an invalid crate to pull in?

Each additional dependency imposes an ongoing audit burden on the downstream consumers of your project.

In an era supply chain compromises are increasing and the consequences are catastrophic, the security story alters the traditional balance of "roll your own" versus "use the shared library".

rcxdude

Which then increases the chance that your homebrew versions have their own security problems (or bugs in general).

kouteiheika

> but this idea that dependency count is the issue is misguided

Well, partially you're right. There are roughly two things which are important here:

1) The number of unique authors/entities controlling the dependencies. (So 10 crates by exactly same author would still count as one dependency.)

2) The amount of code pulled in by a crate. (Because this tanks your compile times; I've seen projects pulling in hundreds of thousands of lines of code in external dependencies and using less that 1% of that, and then people make surprised pikachu face that Rust is slow to compile.)

> I don't have access to the closed-source repo to run `cargo tree` to see where `unindent` is used from, but why do you feel this is an invalid crate to pull in? It is a proc-macro, that deindents string literals at compile time. Would I include it directly in a project of mine? Likely not, but if I were using `indoc` (written by dtolnay), which uses `unindent` (written by dtolnay) my reaction wouldn't be "oh, no! An additional useless dependency!".

I would never include either in any of my projects, and would veto any attempt to do so. As I already said, the 'unindent' crate is trivial to write by myself, and the 'indoc' crate seems completely not worth it from a cost/benefit standpoint in the very rare case I'd need something like that (it's easy enough to make do without it, as it's just a minor situational quality of life crate).

In general my policy on external dependencies is stricter than most people; I usually only include high value/high impact dependencies, and I try to evaluate whether a given dependency is appropriate in context of the concrete project I want to use it in. If it's a throwaway script that I need to run once and won't really maintain long-term - I go crazy with gluing whatever external crates there are just to get it done ASAP! But if it's a project that I'll need to maintain over a long period of time I get a lot more strict, and if it's a library that I expect other people to use then the bar for external dependencies gets even higher (because any extra dependency I add will bloat up the compile times and the dependency trees of any downstream users).

I also find it helpful to ask myself the question - if it wasn't easy to add new dependencies (e.g. if I was still writing in C++, or cargo wasn't a thing) would I still include this dependency in my project? If the answer is "no" then maybe it's better not to.

There are some notable exceptions, but sadly most of the Rust community doesn't do things this way.

arccy

there's 2 kinds of bugs related to security: accidental bugs, and maliciously injected bugs. xz was the second time (which you could have avoided if you vendored starting at a reviewed / trusted point in time...)

from empirical studies, we know the first kind occurs at roughly the same rate everywhere, so it's just do you have capacity to fix it. also, reusable dependencies typically are more configurable which leads to more code and more bugs, many of which might not have affected you if you didn't need all the flexibility.

dependency count is an indirect measure of the second kind, except rust pushes crates as the primary metric, so it will always look bad compared to if it pushed something more reasonable like the number of trust domains.

pessimizer

Agreed, the dependency list looks extremely boring and completely auditable to me.

The dependencies are modular, not diffuse.

I think people saw the title, and got triggered into hate. When actually, this seems author-submitted, and they were probably just trying to be humble about their accomplishment. It's not even the title of the article.

tison

> they were probably just trying to be humble about their accomplishment

Thanks for your reply. To be honest, I simply recognize that depending on open-source software a trivial choice. Any non-trivial Rust project can pull in hundreds of dependencies and even when you audit distributed system written in C++/Java, it's a common case.

For example, Cloudflare's pingora has more than 400 dependencies. Other databases written in Rust, e.g., Databend and Materialize, have more than 1000 dependencies in the lockfile. TiKV has more than 700 dependencies.

People seem to jump in the debt of the number of dependencies or blame why you close the source code, ignoring the purpose that I'd like to show how you can organically contribute to the open-source ecosystem during your DAYJOB, and this is a way to write open-source code sustainable.

Starlevel004

You forgot 4: To break when somebody foolishly does a ``cargo install`` without passing ``--locked``.

arccy

lots of crates by different authors: you need to trust each one not to be compromised

lots of crates by a cohesive group of authors: you "only" need to trust the group reviews each others work properly and they're not all compromised together (less likely).

carlos-menezes

100 direct dependencies is insane.

kpcyrd

The title of the submission is somewhat bait, unfortunately the Cargo.lock doesn't seem to be public. Since my current Rust side-project also has some kind of database (along with, well, a p2p system) and also totals 454 dependencies, I've decided to do a breakdown of my dependency graph (also because I was curious myself):

  - 85 are related to gix (a Rust reimplementation of git, 53 of those are gix itself, that project is unfortunately infamous for splitting things into crates that probably should've been modules)
  - 91 are related to pgp and all the complexity it involves (aes with various cipher modes, des, dsa, ecdsa, ed25519, p256, p384, p521, rsa, sha3, sha2, sha1, md5, blowfish, camellia, cast5, ripemd, pkcs8, pkcs1, pem, sec1, ...)
  - 71 are related to http/irc/tokio (this includes a memory-safe tls implementation, an http stack like percent-encoding, mime, chunked encoding, ...)
  - 26 are related to the winapi (which I don't use myself, but are still part of the resolved dependency graph)
  - 8 are related to web assembly (unused when compiling for Linux)
  - 2 are relatd to android (also unused when compiling for Linux)

In some ways this is a reminder of how much complexity we're building on top of for the sake of compatibility.

Also keep in mind "reviewing 100 lines of code in 1 library" and "reviewing 100 lines of code split into 2 libraries" is still pretty much the same amount of code (if any of us actually reviewed all their dependencies). You might even have a better time reviewing the sha2 crate vs the entirety of libcrypto.so, if that's all you needed.

My project has been around for (almost) two years, I scanned every commit for vulnerable dependencies using this command:

    for commit in $(git log --all --pretty='%H'); do git show "$commit":Cargo.lock > Cargo.lock && cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)'; done | sort | uniq

I got a total of 25 advisories (basically what you would be exposed to if you ran all binaries from every single commit simultaneously today). Here's the list:

    RUSTSEC-2020-0071 - time
    RUSTSEC-2023-0018 - remove_dir_all
    RUSTSEC-2023-0034 - h2
    RUSTSEC-2023-0038 - sequoia-openpgp
    RUSTSEC-2023-0039 - buffered-reader
    RUSTSEC-2023-0052 - webpki
    RUSTSEC-2023-0053 - rustls-webpki
    RUSTSEC-2023-0071 - rsa
    RUSTSEC-2024-0003 - h2
    RUSTSEC-2024-0006 - shlex
    RUSTSEC-2024-0019 - mio
    RUSTSEC-2024-0332 - h2
    RUSTSEC-2024-0336 - rustls
    RUSTSEC-2024-0345 - sequoia-openpgp
    RUSTSEC-2024-0348 - gix-index
    RUSTSEC-2024-0349 - gix-worktree
    RUSTSEC-2024-0350 - gix-fs
    RUSTSEC-2024-0351 - gix-ref
    RUSTSEC-2024-0352 - gix-index
    RUSTSEC-2024-0353 - gix-worktree
    RUSTSEC-2024-0355 - gix-path
    RUSTSEC-2024-0367 - gix-path
    RUSTSEC-2024-0371 - gix-path
    RUSTSEC-2024-0373 - quinn-proto
    RUSTSEC-2024-0421 - idna

I guess I'm doing fine. Keep in mind, the binary is fully self-contained, there is no "look, my program has zero dependencies, but I need to ship an entire implementation of the gnu operating system along with it".

tison

I've updated the Gist with a full Cargo.lock file that can be audited - https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...

Running cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)' gives:

RUSTSEC-2023-0071 - rsa

which is transitively introduced by sqlx-mysql while we don't use the MySQL driver in production.

1vuio0pswjnm7

At first I thought this was sarcasm.

647 points of failure.

synergy20

so,npm hell,or pip hell again?

to be fair, python pkg dependency are fine to me,there might be a lot of pip pkgs still,but not a few hundreds like npm and cargo normally pulls in.

golang also has a reasonable amount of dependencies. npm and cargo dependencies are just scary due to the huge number.

eximius

NPM and pip hell come about for several reasons, one of the biggest being that package versions are global.

In rust, you can project A can use dependencies B and C which can both depend on different versions of D. Cargo/crates generally also solve some of the other metadata problems Python has.

This means the developer experience is _significantly_ improved, at a potential cost of larger binaries. In practice, projects seem to have sufficiently liberal bounds that duplication isn't an issue.

dboreham

Poster boy for all that's wrong with modern software modularity.

henning

I automatically don't want to use this database because the number of third party dependencies are an unfixable, never-ending source of security vulnerabilities.

bityard

Sometimes I'm pretty sure people upvote stories just to see what happens in the comments.

Idiot211

Guilty as charged. To steal a phrase from Reddit, "the true LPT is in the comments"

The true insightful discussion comes in the comments.

callamdelaney

LPT?

orion138

I believe it means Life Pro Tip.

airstrike

Guilty as charged.

rectang

Yes, the amount of effort it takes to audit dependencies scales roughly linearly, so unless you're going to blindly install them, choosing to use a project with so many dependencies means taking on a tremendous amount of ongoing work.

estebank

> the amount of effort it takes to audit dependencies scales roughly linearly

With the lines of code, not the number of dependencies. 10 dependencies of 100 lines of code are arguably easier, but certainly not harder than a single dependency of 1000 lines of code.

rectang

I should clarify that I mean auditing dependency-publisher authentication, rather than full code review.

This returns us to status quo ante, back before supply chain attacks were something we worried about. Bugs and such from dependencies are an annoyance but a manageable problem. Supply chain attacks after publisher account compromise are catastrophic and are not manageable.

estebank

I see, I have a different mental model for what auditing a dependency means. Auditing is "review the code and release processes of my dependency". In my mind what you describe would be "validating my Software Bill of Materials". It doesn't mean that either of us is wrong on what we call auditing, it just explains why sometimes we end up talking past each other in these conversations.

marcosdumay

> auditing dependency-publisher authentication

What does this mean?

It means you'll trust the random people pushing code to cargo if you can prove they indeed are the random people they claim to be?

rectang

When a primary dependency is added to a project, its publishers are evaluated for trustworthiness; it's possible that a dependency might be ruled out if its authors seem sketchy or insufficiently concerned with security. Different organizations might have different standards for what they'd accept, but in any case, this evaluation only needs to happen once.

Afterwards, it suffices to validate with each dependency update that the publisher is the same publisher that was evaluated before.

wslh

Nowadays this applies to everything that depends on modules that depend on more modules (e.g. NodeJS).

arccy

yeah, rust copied the dumpster fire that was npm, i shudder to think of the future of supply chain security when people say rewrite it in rust.

marcosdumay

I'm pretty sure everybody just copied from Perl.

Go did something nice, and it would be good if more people copied. But it was also fairly recent.

cb321

Almost - CTAN (T for "TeX") predated CPAN by about 1 year (but may not have ever had as much automated fetching involved).

norman784

What would a better model to manage dependencies in your opinion? I do like that is easy to add dependencies, but also don't like that a simple hello world Axum app IIRC is around 150 dependencies.

yoyohello13

Rust's problems are not necessarily dependency management, cargo is actually great at it, but that they rely on third party dependencies for critical components (like regex and async). Which makes it very difficult to build anything without 300 dependencies.

I understand why they do it. It's lead to some amazing crates like serde. But I think I fall more in the camp of Python, Go or Odin with a comprehensive standard lib. You can make a whole game with Odin with standard library only. Or an entire web app in Go.

kouteiheika

> but that they rely on third party dependencies for critical components (like regex and async).

Regex is not a third-party dependency:

https://github.com/rust-lang/regex

yoyohello13

My bad. Maybe I was thinking about regex extensions crate.

arccy

have an ecosystem that encourages larger, more well thought out dependencies.

the thin standard library and flat package namespace encourages land grabs for short memorable names for packages that just do a single thing. compared to say java or go where dependencies don't exist because they sound cool but because they solve a real problem.

reaperducer

You don't have to have a solution to recognize that there is a problem.

kibwen

This is both right and wrong in a pernicious way.

When pointing out a problem, you don't necessarily need to provide a better solution. However, if you refrain from providing a better solution, you are still implicitly asserting that there exists some better solution.

So then it's possible to counter that with: a better solution may not exist. If you think a better solution does exist, then the burden of proof is on you to point out an existing solution that does better, or to otherwise establish that some better solution must exist.

Rust could very well be at a global optimum for the problems it's trying to solve. Sometimes tradeoffs are just inevitable.

jpc0

I would say, evaluate how much work it would take to build it yourself, and if that is larger than your scope allows ask yourself some serious questions about whether the solution you came up with is overly complex.

Some problems are hard to solve. But not all of them are.

An example, and this is an observation. Where can I grab a library that just parses parses HTTP 1.0, HTTP 1.1 and HTTP 2.0 messages. Not a HTTP framework, something along the lines of httpparse. I pass it a buffer of bytes and out the other end pops a Result<error, HTTPResponse>.

Sure there are hard problems to solve there but they are API design problems.

I don't need tokio or some sort of web abstraction. If I want to use HTTP as a transport over carrier pidgeon I want to be able to do that with said library.

Doesn't exactly exist though and because everyone just pulls in Tokio( I do mean the entire Tower stack or whatever it's called these says) nobody even notices the issue. And every single HTTP server rewrites that functionality with slightly different edge cases and bugs.

That's basic internet infrastructure right there and we can't get a conical library for that yet you are arguing that 3 different implementations of the aame hash function pulled into the same project is viable?

arccy

it may be that rust tried to solve for the wrong problems, so while it may be at the global optimum, the foundation is just broken.

that said, design choices like a flat package namespace are inexcusable. even npm started to move away from it.

eknkc

Is the dependency count supposed to be impressive?

speed_spread

Past a number of dependencies, actually getting anything to build deterministically, run reliably and then not get 0wnd to bits becomes an actual challenge, which many enthusiastic developers have a masochistic kink for.

The thrill of complexity is real.

jjtheblunt

i think the implication is that it's precarious...how does one know all are bug free, for example?

thinkharderdev

Is it? You know for a fact that there are bugs in some of your dependencies. But how many bugs would the code you wrote from scratch instead of adding a dependency have?

jjtheblunt

Are you asking if it is the implication, or if it is the implication that which is implied is true?

thinkharderdev

Asking if the implication is true that having more dependencies is on net bad for security for a complex system. The alternative being reimplementing whatever you would otherwise pull in a third-party dependency for. On the one hand, you reduce the attack surface in your supply chain. On the other hand you run the risk of introducing security bugs in the code you write that is outside your domain of expertise. It's not at all clear to me which one would be more important.