Unification (2018)

59 points 13 comments 14 hours ago

primitivesuave

Brought back memories to over a decade ago with Mathematica. It's quite easy to pattern match simple expressions like:

``` expr = foo[bar[k], baz[V]]; expr /. foo[x_, baz[y_]] :> {x, y} ```

But in real-world use cases, this paradigm quickly exposes some unhandled edge cases, or cases where the evaluation becomes prohibitively expensive. Nowadays I use TypeScript, and this has ignited some curiosity into finding out if TS does anything to optimize for this in type inference.

asplake

That’s kinda what brought unification to my attention. For my own education I’m writing a compiler for a simple ML-style language. Enjoying Pierce’s Types and Programming Languages meanwhile.

piinbinary

Unification is the core of Algorithm W, aka Hindley–Milner type inference. It's at the core of the type inference algorithms for languages like Haskell, OCaml, and standard ML.

moi2388

“ lowercase letter represents a constant (could be any kind of constant, like an integer or a string) An uppercase letter represents a variable”

I already disagree with this syntax.

brodo

That's the default way it has been done in Prolog since forever. If you have static typeing, like in Soufflé, you can get around it.

maweki

Datalog does not need/do unification for rule evaluation, as it is just matching variables to values in a single direction. Body literals are matched against the database and the substitutions are applied to the rest of the rule and the head.

Prolog does unification of the proof goal with the rule head. It's necessary there but not with datalog.

kevindamm

While bottom-up evaluation is the norm in datalog, it is not a requirement and there are datalog engines that evaluate top-down or all-at-once.

But I still agree with you about the capitalization. Some formats, like KIF, use a '?' prefix instead, and I've seen some HRF notations that mix the '?' prefix with non-KIF formatting (':-' operator and '.' terminator).

cubefox

It's also used in the theory of formal grammars, where terminal symbols (constants) are lower case, and non-terminal symbols (variables) are upper case: https://en.wikipedia.org/wiki/Formal_grammar#Introductory_ex...

moi2388

I did not know this, thanks

cubefox

Where do you need full type unification rather only type pattern matching?

housecarpenter

Consider the identity function f, which just takes an argument and returns it unchanged, and has the polymorphic type a -> a, where a is a type variable. What's the type of f(f)?

Obviously, since f(f) = f it should be a -> a as well. But to infer it without actually evaluating it, using the standard Hindley-Milner algorithm, we reason as follows: the two f's can have different specific types, so we introduce a new type variable b. The type of the first f will be a substitution instance of a -> a, the type of the second f will be a substitution instance of b -> b. We introduce a new type variable c for the return type, and solve the equation (a -> a) = ((b -> b) -> c), using unification. This gives us the substitution {a = (b -> b), c = (b -> b)}, and so we see that the return type c is b -> b.

But if we use pattern matching rather than unification, the variables in one of the two sides of the equation (a -> a) = ((b -> b) -> c) are effectively treated as constants referring to atomic types, not variables. Now if we treat the variables on the left side as constants, i.e. we treat a as a constant, we have to match b -> b to the constant a, which is impossible; the type a is atomic, the type b -> b isn't. If we treat the variables on the right side as constants, i.e. we treat b and c as constants, then we have to match a to both b -> b and c, and this means our substitution will have to make b -> b and c equal, which is impossible given that c is an atomic type and b -> b isn't.

guerrilla

I think in dependently typed programming languages and theorem proves. As I understand it, even System F (like Haskell) benefits, no?

cubefox

I guess in System F, if a function f accepts a complex type A, and another function g returns a complex type B, both types could involve type variables. Then for the compiler to check whether the expression f(g) is valid, it (the compiler) needs to determine whether a unification of A and B is possible. Not sure though.