The presentation introduces Unison, a novel programming language and ecosystem built on the principle of content-addressable programming, aiming to revolutionize software development by referencing code by its content (hash) rather than its name, leading to enhanced integrity, simplified refactoring, and a more robust development workflow.
Mind Map
คลิกเพื่อขยาย
คลิกเพื่อสำรวจ Mind Map แบบอินเตอร์แอคทีฟฉบับเต็ม
So, how many of you were able to read
slides, quote on the slides I just shown
and understand perfectly everything?
Anyone who was not able who's doing
programming on a database is not being
able to follow along these code lines.
Can you guess what programming language
>> This is alol 68 programming language
older even than myself.
So the question is in the terms of
internet age this is ancient programming
language. You correctly noted that this
is predecessor to Pascal and modular 2
and Alberon this line of languages.
But how come we can understand this
programming language even today. So this
probably means some of the fundamental
things are still the same up to this day.
day.
So in the age of AI well the present age
of last two years the question is how
stay relevant in our profession
and most of us under the pressure of the
hype and media and influencers will
start heavily learning how to use AI how
to do vibe coding
but one more way to stay relevant and
secure your future is to learn something
that is not mainstream. Well, you will
be gambling a bit because today everyone
knows Java, Python, JavaScript,
object-oriented programming, that's the mainstream.
mainstream.
That's the baseline.
But if [snorts] you know, for example,
actor model or event modeling, if you're
experimenting with web assembly, these
are the things that are coming. They are
on the horizon. We are pretty much sure
that sooner or later web assembly will
become a mainstream. However,
However,
knowledge of these techniques or
paradigms like functional programming
will get you to the top 10 or 15 or 20%.
Beyond that, you need to look into
things which are currently being
researched like quantum computing or
formal vericification. you need to at
least get informed about it so you can
follow the progress of these research
topics and maybe play it a bit hazard
but win in the long run. So today I'm
going to present something this is that
is still in the research phase and that
is content addressible programming.
One implementation of this concept is
Unison programming language.
Anyone here who read uh Scala red book
about functional programming in Scala?
Well, now you know what authors did
after writing the publishing the book.
Paul Chesano and Runar Bernoson
started working on Unison programming
language. Unison is very similar to
Huskell. It's modern strongly type
purely functional programming language
strictly evaluated.
It's open source and the best thing of
all these three guys
Ariani along with them they are
developing not only new programming
language they are uh developing complete
universe and they are get funding. So
believe it or not there are venture
capitalists in this world who are
willing to invest in something that
might not be profitable in near future
that might not be profitable ever but
they are supporting their noble goal and
from their perspective they are doing
something they love with passion and
getting paid for it. You cannot get
better than that.
Research on the unison programming
language started in 2013 and first
public announcement was in 2014 in the
strange loop conference. Two years after that,
that,
Unison computing was incorporated and
the first public alpha was released in 2019.
2019.
In 2021, Unison share community hub was
released. Remember they're building
everything from the scratch whole
universe. So this would be equivalent of
the GitHub code sharing HubSpot.
HubSpot.
In 2024 they launched Unison cloud which
is clouds cloud as it was supposed to be
because if you remember 20 years ago we
were maintaining servers ourselves. Then
the promise of the public cloud arrived
and the promise was pay us we will
maintain it for you. We have all these
security certificates.
Where are we today?
You need certified expert for the public
cloud to maintain your account. So
something is not right there. So Unison
cloud is the promise of the clouds as it
was supposed to be. And this year they
released bring your own cloud. So you
can now install Unison cloud
infrastructure or on your own hardware.
So Unison is centered around one big idea.
idea.
Definitions should not be referenced by
name. Definitions should be referenced
by content.
So how much one sentence can change?
Let's see today how it works in
practice. So you have a definition of
your function in unison programming language
language
and you run compiler.
Compiler will perform syntax analysis,
semantic analysis. It will strip it down
to abstract syntax tree
and then you will or compiler will
compute shot three cryptographic hash
In practice, when you start calling one
fraction from another, you're still
working with the names,
but under the hood, the names of the
function you're calling
will be converted into hash.
So instead of
name of the function inc being contained
in abstract syntax tree you have a
cryptographic hash of the code which
will be executed at that place
and typical question is what are the
chances of collision? So if you compute
the hash function from the code what are
the probabilities they will be clash. So
I suppose most of us are using git
today. Do you know which cryptographic
function git is using? Shaan.
Shaan.
So converted into human language if
every atom on earth spit out fresh hash
every nancond
you would expect your first collision in
46 billion years. In other words,
there are high chances something else
will happen before clash on the SH3 horizon.
horizon.
So let's see what are the consequences
of this simple sentence simple concept
around whole languages being built upon.
So first thing
every function every term every
definition every data structure will be
converted into binary a hash will be computed.
computed.
Names still do exist as a developer. Of
course you will not write the hash
function. You will use names
but the names are just metadata.
So internally hashes are stored and the
names are attached as a metadate
which means that you can change name of
the function at will
instantly without breaking anything. We
eliminate the whole class of competition
for name problems.
So renaming is instant. It's painless.
It's not breaking anything. because hash
is not changed and this is what our
references built upon hashes.
hashes.
So it's a simple not breaking operation
and it takes some time for these things
to to sink down in your head
because when you're using any kind of ID
you can recall what simple act of
renaming a function means recompiling
everything and maybe detecting name
every function is versioned separately
So what you can see here is a sequence
of refactorings of the [snorts]
functioning. In this case I'm not
changing the name. I'm not renaming
function. I'm changing the implementation.
implementation.
This is function programming language
but the function nature of the code is propagating.
propagating.
So this is a pend only code base
unlike git where you introduce new
version of a function and simply
override the existing one previous one
here as you're refactoring
you will produce new versions of the
same function from the perspective of
appendon codebase this will be
completely new functions.
So this means that you can evolve
function over time again there won't be
any kind of clashes as you will see
[snorts] So each function will be
compiled separately and stored inside of
a database.
And this may sound a bit scary storing
code in the database but the thing is we
are all doing it today. We are just not
aware fully aware of it because git is a
database of code
and you still with git you're using
specialized tool. Well with unison you
also use specialized tool. It's just
that it's called UCM unison code manager
and commands are pretty much the same
because in its essence you're doing the
same operations you would do anyway. You
produce new code, you store it in a
database, you may fetch it, version it,
push it to the central repo for code sharing.
sharing.
It's [snorts] only that git is storing
textual files. Git is working with
textual files. Here we are working with functions.
functions.
Git is computing hashes of the you could
say of content of the file system.
Unison is computing hashes of abstract
syntax trees computed from functions.
When you look at the cryptographic
functions, git is using sha one, unison
is using shell 3. So stronger guarantees
there as well.
But probably most drastic difference
between git and unison is in git you
have syntactic diffs on the textual files
files
Interesting consequence of this way of
working with the code is you have a
guarantee that your repository will
never ever been broken. Why?
Why?
Because if you can compile your code,
you will be able to store it inside of
the codebase. If you cannot compile it,
hash is not produced from the syntax
tree. You don't have anything to store.
So if my colleague
is pushing code to the codebase, he's
actually pushing syntactically
syntactically
semantically correct code that was
converted into binary format.
So when I pulled it from the codebase, I
have a strong guarantee that I will be
able to compile it. And again all these
things will sink down after this talk
because it takes some time to understand
what we took for granted up until now.
What are the possibilities that this is opening?
No more textual files. And this sounds scary.
scary.
Why is that? Because Unison code manager
is not working with textual files. It's
working with functions which are being
compiled stored inside of the database.
You have a single usually when you're
working with unison you have a single
textual file called scratch file
and the cycle of development looks like
this. You start with empty scratch file.
You write the function.
You compile it
and if everything is good, you add it to
the codebase and after that you can
completely delete it, wipe it out from
the text shell file because you don't
need it anymore. You're not storing text
shell files in your version control.
You're storing compile functions.
If you want to edit something, there's a
command edit
followed by a function name which will load
load
binary representation of the function,
convert it into Unison code and place it
in your scratch file for you to edit.
After you're done with the editing, you
will update definition in the codebase.
And again updating definition is
creating new function even though the
name is same but multiple versions of
the same function can exist at the same
How come no more builds? Well
technically yes you have builds because
you need to compile build your code into
binary representation to store it inside
of the database. But this is you could
say perfect incremental compilation.
Why? Because at any point in time you
have compiled code in your codebase.
You will write one, two, three functions.
functions.
They will be compiled almost instantly
and you can store them inside of the database.
database.
So your perception is since we are doing
incremental compilation all the time is
that it is zero build time zero
compilation time and again consequences
of this are
big because if we are working together
on the same project and I'm a new member
in your team what I will do I will check
out complete project
what is arriving to me locally is a
binary code already compiled stored in
the database. I'll be able to run it
right away.
Someone else did compilation for me.
So I'm pulling binary representation.
I'm able to execute it right away.
Think of CI/CD pipelines.
How long it takes to build?
Well, in this scenario, CI/CD pipeline
does not have anything to build. And it
it's a question if it will exist in the
same format.
So as a general rule,
if your function did not change, there's
no need to recompile anything because
compilation is done once. Result of the
compilation is stored inside of a database.
As I said, probably the biggest change
is introduction of semantic version control.
control.
You can see on the line two something
that will be detected by both git and
unison code manager
net connection function replace
something that was there previously
every version control will system will
detect that but as you can see on the
lines five six and seven
I have a detection here that I'm calling
three functions connections send
connection receive and upgrade
and these functions were changed.
How does Unison knows that? Well,
hashes changed.
Other people changed implementation of
these three functions. They compiled
as a result of compilation.
hash is changed and now unison can offer
me semantic version control that is able
to detect that three functions I'm calling
calling
change their implementation
and this is opening so much space for
various new things that we built complex
and thinking of it you have referential
integrity because if You're calling
something you're not calling it by name
anymore and name is bit like a shield.
You you up until now you had no
information what is hiding behind the
shield but now you're working with
hashes which are representing implementation.
implementation.
So whole classes of attacks are not
possible anymore.
Today we had a talk about security in
the supply chain. Various attacks that
can occur. You have guarantee for the
integrity of your code.
If anyone changes implementation of the
function your code is calling, you will explicitly
explicitly
know about that. You will not stand
naked there being attacked by various
changes in the implementation.
And again, it's interesting that one
simple idea can propagate on so many levels.
In use, as I said, multiple versions can
coexist in peace. In git, you have a
situation like this. You wrote utility function
function
and you call it from three places.
If you refactor this function,
produce new version of your utility function
function
in all the places where you you call
this function, you need to upgrade call references.
references.
In unison, you have ability to refactor
slowly. You're not forced to refactor
everything all at once. or if you have a
specific need for some pieces of your
code to call previous versions of the
function, you will be able to do so.
Unlike git which is overriding previous
implementations so that the only thing
you can access is the latest version of
the function in unison. Multiple
dependency hell it exists. I saw it. So
typical reason for depend dependency
hell are trans transitive dependencies
and so-called diamond structure. So
let's imagine I'm writing application in
the version one which is calling utility
function B depending on the version one
of the library
and over time second version of the
library is being released. Some of the
functions are been changed
some of the functions may be removed and
they added something I need. So for the
implementation of the function C I will
use library in the version two.
So what will happen if I try to call if
I try to write and compile code which is
calling C.
Now I have a problem. I have a problem
because I have two versions of my
library in my code
in most mainstream languages. This
cannot exist. This is impossible
in unison. This is indeed possible to
the extent that if I want I can have
three, five or 10 versions of the base
library. I'm not saying it's good.
This is just the tool and you can misuse
it. So you need to be careful but things
like this become possible. So you'll you
will be able to ship project like this.
You will buy yourself some time to
refactor but you will not break if you
need to release on Tuesday. You will not
break guarantees you gave to your
customers. You will be able to launch
your product release next version and
then go and refactor things later on. Of
course, this is possibility, not a
recommendation. This is not a best practice.
practice.
What you naturally want to strive to
reach is using just a single version.
And now it's a matter of a bit of discipline
discipline
to follow this goal. But you will not be
undermined by the technology you're using.
Again side effect
of the way Unison is working with the
code is built-in serialization.
In Unison community there is a saying
the program is the protocol.
So the compiled flat binary
representation is actually serialized
version of your code. It will be
compiled once, of course, stored in a
database and then you can reuse it as
many times as you want for the various
purposes you might need. There's no need
to write serialization or des
serialization if you have a data
structure. If you can compile it, you
just got serialized version of your data
structure no matter what it is. So no
need to write serialization, d serialization,
serialization,
no any kind of marshalling needed, no protocols.
protocols.
You're just saving time.
And this is coming as a side effect of
the technology you're using.
Also, if you store any kind of definition
definition
data inside of a database,
this is strongly typed compiled code
you're storing inside of a database. So
later on you can load it. It will be
properly dialized and loaded into your
application and you have a guarantee of
a data integrity.
There's no possibility tech on a
technical level that you will load
something in a wrong data structure the
data will be corrupted or if you're
evolving your data structures over time.
Again, we're coming to the fact that
Unison can keep multiple
refactored versions of the same data
structure over time.
So you can store for example if you're
doing event sourcing
inevitably you will probably be forced
to evolve definition of events over time
and this is challenge.
So with Unison, you can support multiple
coexisting versions of your events
living in peace and the code you're
writing will continue working without
any changes because you will never run
into situation of
of of dializing your data or your events
in in in improper way and passing out
into something that will not be able to
handle it. No more impedance mismatch,
And again one more side effect
our unit tests are functions as well.
Function calls with arguments are just
function calls.
They will be compiled once. They can be
cached or to call it stored in unison codebase
codebase
and they can exist indefinitely there.
The only time where where your tests are
invalidated is when you refactor the
function that you're testing.
But if you have long living definition
of a function, your tests are executed
once you compile them. You store them
inside of a database. That's it. No need
to pay attention to them. No need to run
them again because why you would want to
run again test against implementation of
function that did not change. And
remember if any of the functions you're
calling changed,
this will also affect your function. The
hash of the function will change if any
of the functions you're calling changed.
So the changes will propagate indeed.
But you have a strong guarantee if the
hash of your function is unchanged.
That's it.
You don't have to execute tests anymore indefinitely.
indefinitely.
And again think about the typical CI
CI/CD pipeline
with big projects. Over time, amount of
regression tests will grow.
If you can execute them once, store
indefinitely. You're simply eliminating
execution of these tests. And this in
practice means
that you will not have 7 hours or 15
hours or on very big projects even more
running of tests every single time. And
this is saving time, computing power,
electricity, whatever aspect you want or
your nerves, whatever aspect you you
Unison has something that's called
adaptive service graph compression
because now there's possibility for you
technical support to send code over the wire.
wire.
So what you can see here is the actual code
code
that can be executed in Unison cloud environment
environment
or any kind of distributed setup of unison.
unison.
So you have four cat function.
Bob is the name of the remote node
you're contacting.
Factorial six. Factorial of six is
function call with an argument and a do
is a keyword that will introduce delayed computation.
computation.
So do factorial 6 keyword do will
prevent factorial 6 from being executed
on this machine. And it goes like this.
Node A
sends do factorial 6 to remote machine
or pardon me factorial 6 to machine B.
What is being sent under the hood is the hash
hash
hash of the function with argument six
and node B receives hash and arguments
and says I'm not familiar with this hash
can you provide me code binary code for
the function this hash represents node A
will receive this request it will send
definition and then after node B
collects all the dependent dependencies
it will be able to execute this locally.
So what happened here instead of node day
day
sending something like JSON over the
wire [snorts] executing on machine B and
machine B responds back with uh again
JSON which needs to be serialized and
deserialized. What you have here is a
node B collecting dependencies and
starting executing code locally.
So you you could say you have some kind
of just in time deployment where over
time as calls are progressing
code will migrate to the place of the execution
execution
and I won't go further into this but
there are some serious consequences of
this. It takes some time for everyone to understand.