Categories
Tech

Symbolic Dependencies and Refactorability

What language features are (minimally) necessary in order to implement full refactorability of symbols? (Think “Rename Symbol” in your favorite IDE.) For example, when I used Python instead of Java, I had discovered that refactoring interfaces was much harder. Is there a deeper explanation for this?

If it was simply types, could we then run some program that infers types in Pythons based on all the values that the variables take on (as determinable before run-time), and could we then get full refactorability in Python? For instance, consider if we had a Python program

something = some_runtime_determined_boolean()
x = 2
if something is True:
               x = 3

We should be able to infer that x will always be an integer, even though Python is technically dynamically typed.

A separate question, that I had found to be related, was that of determining symbolic dependencies: given a symbol in a program, what does it depend on, and what other symbols in the program depend on it? These symbols must be determined in order for refactoring the original symbol to be possible. For example, if two different classes had methods named methodX(), then if we wanted to refactor one of the methodX()’s, we needed to take care in not changing the usages of the other. For Java, the type specification seemed to be what helped with this.

Let’s dive into symbolic dependencies in more detail.

Symbolic Dependencies

What was determinable fully before run-time is what could be evaluated before run-time. (This was the idea of a personal project I had to develop a “pre-computer” preprocessor that could evaluate everything knowable before run-time.)

Thus, dependencies defined at run-time (say for example with arbitrary code execution of code fetched from a remote resource) will not be determinable. Which dependencies can we determine?

Consider an expression in an untyped language, like

z = x + y

Here, x and y can be any objects, but at this point we know that they must support a “+” operation. (More specifically, the class of values of x must support a right “+”, and the class for y must support a left “+”.)

This expression defines the symbol z to have dependencies on the symbols x and y. Thus, if we were to ask about reverse dependencies (which symbols depend on x?), then z would be one of them.

The assignation of z could also change in the future, giving rise to new dependencies of z on other symbols, and new reverse dependencies of those symbols on z.

Why is Java more special than Python with reverse dependencies? Can’t we construct a deterministic dependency graph using symbol definitions and assignations that show up, and then get the reverse dependencies from this (just flip the directed edges)?

Let us investigate this question in the context of the issue with the same-name method in different classes, above. In Python, say we have:

class Class1:
               def method1():
                              // implementation
class Class2:
               def method1():
                              // implementation

Assume we’re not dealing with multiple inheritance, or at least that given an object it’s determinable pre-run-time which method1() it’s using. Then, we can totally determine reverse dependencies of each method1()! In other words (for most “normal” Python code), we totally can determine symbolic dependencies by analyzing the code!

To state this generally: if we assume that the language is such that no dependencies on symbols can be specified at run-time (so even for code that executes at run-time, we still know pre-run-time which symbols are being depended on from analysis of the code), then that is equivalent to determining symbol dependencies fully (and as a result, to determining reverse symbolic dependencies fully.) Those dependencies that can’t be determined are the ones specified at run-time (not even necessarily executed at run-time.) Dependencies that only take effect at run-time but are still specified pre-run-time in the code can still be determined; for example, consider the program:

z = x + y
if runtime_boolean():
               z = a + b

We can determine that z must depend in the worst case on x, y, a, and b. Thus, the reverse dependencies for any of these symbols must include z.

Is there a way to formalize this statement as a theorem, where the formalization should connect informally to our intuition? Is that necessary?

First, the formalization of dependencies as a generic directed graph is clear. The reversal of this graph is always mathematically possible. Thus, we can provably say that reverse dependencies are determined exactly when dependencies are determined, and as a part of the formalization the dependency graph can only be constructed pre-run-time when the specifications are made pre-run-time (irrespective of whether those code paths actually get executed at run-time.) Thus, this statement seems to be provably true.

Refactoring of Symbols

Assuming that dependencies and reverse dependencies are fully determined, the language constructs will give us a sense of what the natures of the dependencies are (for example, for the expression z = x + y, the nature of the dependence of z on x involves the “+” operator.) This in turn determines how refactoring will work: when changing a symbol’s definition, how should we in turn change the symbols that depend on it?

We may define refactoring as changing the code of the program but still guaranteeing the same behavior. Then, given z = x + y, if we were to say change the definition of x from 2 to 1, for automatic refactorability we would need the language to know what “+” means for integers and rewrite z = x + y accordingly to be z = x + 1 + y.

More specifically, say that a symbol X depends on a symbol Y in a way Z. Say we have a “DependencyType” class, and the instances of this class determine the possible values for Z (generically.) In order to enable refactoring, say, of the symbol Y in this setup, we would need the DependencyType class to implement a method like refactor(symbolY, symbolX), that returns the “way” the refactoring must occur (maybe the new dependence of X on Y, with possibly a new DependencyType?) This would be a formal model of how refactoring must be done.

Beyond this, without a detailed list of language features, it is impossible to investigate specific cases of refactorability. But we will look now at one example: that of Python. Could this explain why refactoring was not doable (or maybe harder) in Python and why it can be done more easily in Java?

Let us consider again the example of Python with two different classes having a method with the same name. Can we implement the refactor method and apply the model above?

I suspect that I just didn’t have the bandwidth (or the tools) to see this when I first was actively using Python, but that this refactoring is completely doable in Python. In fact, if a method method1() is being used in the code, then we say that it must depend on one of the defined method1()’s (in the class code), with the same DependencyType (call it “being an instance” here.) This DependencyType’s refactor method would look like (pseudo-code):

impl refactor(symbolChanged, symbolToChange) {
               return new Dependency(symbolChanged, symbolToChange, this);
}

Since the dependencies can already be sorted out (from the previous section), making it clear which objects came from which classes (based on type inference from what is specified in code), this is doable!

A Language for Full Dependency Specification

Is it possible to construct a language with full symbolic dependency specification? What is the minimal way we could do this?

(Any use cases that go beyond this minimum must therefore involve a tradeoff.)

Well, we need all dependencies to be specified at pre-run-time. Assume that a program has been fully pre-computed, or that we know exactly which paths must be determined to be followed at runtime. Any reflection capabilities (which, for us, includes eval()-type functionalities) must not involve values that can only be computed at run-time. We can totally construct a language that guarantees this.

Can we formalize this point? We can in fact define reflection as “code specification at runtime”, and the statements above then follow quickly and provably, since symbolic dependencies are defined to be in code.

Conclusion

To summarize:

  1. Full understanding of symbolic dependencies is equivalent to full understanding of reverse symbolic dependencies, which is equivalent to all dependency specifications being made before run-time, which is also equivalent to reflection capabilities not being able to use values that can only be computed at run-time. In particular, this for example enables “Rename Symbol” functionality in IDEs.
  2. Full refactorability of symbols is equivalent to being able to implement an interface like for the refactor method given above.

I have seen arguments that invoke the Halting Problem to say that dependencies can never be fully determined: for example, if we have code like:

z = x
function_which_we_cant_know_whether_it_will_halt()
z = a

Then we can’t know whether z depends on a. However, we can say that in the worst case z could depend on a, so any refactorability functionality in IDEs can operate on worst-case assumptions and still solve the problem. Nevertheless, when we say that pre-run-time computation is what is needed minimally to know dependencies, that implicitly assumes that such computation won’t halt — otherwise the analysis needed to find the dependencies wouldn’t halt, so we couldn’t know the dependencies.

Project done in November 2020.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.