Fun with Finite State Transducers

45 points 6 comments 4 days ago

zokier

I've been thinking that FST could be well suited for building routers for web frameworks. I.e. matching request path `/foo/42` to a set of route patterns like `/foo/{id}` etc. As I understand FST should be near perfect for this use and could potentially allow very good performance. Especially if you can construct the FST at compile time. Somewhat surprisingly I haven't seen anyone doing anything in that direction though, and FST literature is bit difficult if you don't have formal NLP background

rickette

Radix Trees are (often?) used for this purpose, for example in Chi: https://github.com/go-chi/chi. Coincidentally FSTs and Radix trees share some similarities.

woodruffw

> Coincidentally FSTs and Radix trees share some similarities

Indeed -- I think a useful way to comprehend FSTs is as a radix/prefix trie that also allows suffix compaction. There's probably a formal correlation proof between the two, but I don't know it off the top of my head.

jyndbeb

I'm not quite sure I understand how they are applicable, since access patterns itself can be arbitrary complex

For example how is github.event.pull_request.labels[another_variable].name handled?

yorwba

Since it only needs to classify the result as one of "fixed", "structured" or "arbitrary", it doesn't matter what the value of "another_variable" is, you'll always be in the same state at the end. Finite state transducers can handle an infinite number of possibilities with a finite number of states just fine.

It's a bit strange though that the author constructs the FST to only recognize a finite list of strings. Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

woodruffw

Author here.

> Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

Yep, exactly -- there's a great deal of prenormalization that in principle could be pushed into automata run over the FST instead. The reason I didn't do that is laziness, but it's the obvious next step :-)