Control and Correlation

I.

A thermostat is a simple example of a control system. A basic model has only a few parts: some kind of sensor for detecting the temperature within the house, and some way of changing the temperature. Usually this means it has the ability to turn the furnace off and on, but it might also be able to control the air conditioning. 

The thermostat uses these abilities to keep the house at whatever temperature a human sets it to — maybe 72 degrees. Assuming no major disturbances, the control system can keep a house at this temperature indefinitely.

In the real world, control systems are all over the place.

Imagine that a car is being driven across a hilly landscape.

A man is operating this car. Let’s call him Frank. Now, Frank is a real stickler about being a law-abiding citizen, and he always makes sure to go exactly the speed limit. 

On this road, the speed limit is 35 mph. So Frank uses the gas pedal and the brake pedal to keep the car going the speed limit. He uses the gas to keep from slowing down when the road slopes up, and to keep the car going a constant speed on straightaways. He uses the brake to keep from speeding up when the road slopes down.

The road is hilly enough that frequent use of the gas and brake are necessary. But it’s well within Frank’s ability, and he successfully keeps the needle on 35 mph the whole time. 

Together, Frank and the car form a control system, just like a thermostat, that keeps the car at a constant speed. You could also replace Frank’s brain with the car’s built-in cruise control function, if it has one, and that might provide an even more precise form of control. But whatever is doing the calculations, the entire system functions more or less the same way. 

Surprisingly, if you graph all the variables at play here — the angle of the road, the gas, the brake, and the speed of the car at each time point — speed will not be correlated with any of the other variables. Despite the fact that the speed is almost entirely the result of the combination of gas, brake, and slope (plus small factors like wind and friction), there will be no apparent correlation, because the control system keeps the car at a constant 35 mph. 

High precision technical diagram

Similarly, if you took snapshots of many different Franks, driving on many different roads at different times, there would be no correlation between gas and speed in this dataset either.

We understand something about the causal system that is Frank and his car, and how this system responds to local traffic regulations, so we understand that gas and brake and angle of the road ARE causally responsible for that speed of 35 mph. But if an alien were looking at a readout of the data from a bunch of cars, their different speeds, and the use of various drivers’ implements as they rattle along, it would be hard pressed to figure out that the gas makes the car speed up and the brake makes it slow down. 

II. 

We see that despite being causally related, gas and brake aren’t correlated with speed at all.

This is a well-understood, if somewhat understated, problem in causal inference. We’ve all heard that correlation does not imply causation, but most of us assume that when one thing causes another thing, those two things will be correlated. Hotter temperatures cause ice cream sales; and they’re correlated. Fertilizer use causes bigger plants; correlated. Parental height causes child height; you’d better believe it, they’re correlated. 

But things that are causally related are not always correlated. Here’s another example from a textbook on causal inference

Weirdly enough, sometimes there are causal relationships between two things and yet no observable correlation. Now that is definitely strange. How can one thing cause another thing without any discernible correlation between the two things? Consider this example, which is illustrated in Figure 1.1. A sailor is sailing her boat across the lake on a windy day. As the wind blows, she counters by turning the rudder in such a way so as to exactly offset the force of the wind. Back and forth she moves the rudder, yet the boat follows a straight line across the lake. A kindhearted yet naive person with no knowledge of wind or boats might look at this woman and say, “Someone get this sailor a new rudder! Hers is broken!” He thinks this because he cannot see any relationship between the movement of the rudder and the direction of the boat.

Let’s look at one more example, from the same textbook: 

[The boat] sounds like a silly example, but in fact there are more serious versions of it. Consider a central bank reading tea leaves to discern when a recessionary wave is forming. Seeing evidence that a recession is emerging, the bank enters into open-market operations, buying bonds and pumping liquidity into the economy. Insofar as these actions are done optimally, these open-market operations will show no relationship whatsoever with actual output. In fact, in the ideal, banks may engage in aggressive trading in order to stop a recession, and we would be unable to see any evidence that it was working even though it was!

III.

There’s something interesting that all of these examples — Frank driving the car, the sailor steering her boat, the central bank preventing a recession — have in common. They’re all examples of control systems.

Like we emphasized at the start, Frank and his car form a system for controlling the car’s speed. He goes up and down hills, but his speed stays at a constant 35 mph. If his control is good enough, there will be no detectable variation in the speed at all. 

The sailor and her rudder are acting as a control system in the face of disturbances introduced by the wind. Just like Frank and his car, this control system is so good that to an external observer, there appears to be no change at all in the variable being controlled.

The central bank is doing something a little more complicated, but it is also acting as a control system. Trying to prevent a recession is controlling something like the growth of the economy. In this example, the growth of the economy continues increasing at about the same rate because of the central bank’s canny use of open-market operations, bonds, liquidity, etc. in response to some kind of external shock that would otherwise cause economic growth to stall or plummet — that would cause a recession. And “insofar as these actions are done optimally, these open-market operations will show no relationship whatsoever with actual output.”

The same thing will happen with a good enough thermostat, especially if it has access to both heating and cooling / air conditioning. The thermostat will operate its different interventions in response to external disturbances in temperature (from the sun, wind, doors being left open, etc.), and the internal temperature of the house will remain at 72 degrees, or whatever you set it at.

If you looked at the data, there would be no correlation between the house’s temperature and the methods used to control that temperature (furnace, A/C, etc.), and if you didn’t know what was going on, it would be hard to tell what was causing what.

In fact, we think this is the case for any control system. If a control system is working right, the target — the speed of Frank’s car, the direction of the boat, the rate of growth in the economy, the temperature of the house — will remain about the same no matter what. Depending on how sensitive your instruments are, you may not be able to detect any change at all. 

If control is perfect — if Frank’s car stays at exactly 35 mph — then the system is leaking literally no information to the outside world. You can’t learn anything about how the system works because any other variable plotted against MPH, even one like gas or brake, will look something like this: 

This is true even though gas and brake have a direct causal influence on speed. In any control system that is functioning properly, the methods used to control a signal won’t be correlated with the signal they’re controlling. 

Worse, there will be several variables that DO show relationships, and may give the wrong impression. You’re looking at variables A, B, C, and D. You see that when A goes up, so does B. When A goes down, C goes up. D never changes and isn’t related to anything else — must not be important, certainly not related to the rest of the system. But of course, A is the angle of the road, B is the gas pedal, C is the brake pedal, and D is the speed of the car. 

If control isn’t perfect, or your instruments are sensitive enough to detect when Frank speeds up or slows down by fractions of an mph, then some information will be let through. But this doesn’t mean that you’ll be able to get a correlation. You may be able to notice that the car speeds up a little on the approach to inclines and slows down when it goes downhill, and you may even be able to tie this to the gas and brake. But it shouldn’t show up as a correlation — you would have to use some other analysis technique, but we’re not sure if such a technique exists.

And if you don’t understand the rest of the environment, you’ll be hard pressed to tell which variation in speed is leaked from the control system and which is just noise from other sources — from differences in friction across the surface of the road, from going around curves, from imperfections in the engine, from Frank being distracted by birds, etc.

IV.

This seems like it might be a big problem, because control systems are found all over biology, medicine, and psychology.

Biology is all about homeostasis — maintaining stability against constant outside disturbances. Lots of the systems inside living things are designed to maintain homeostatic control over some important variable, because if you don’t have enough salt or oxygen or whatever, you die. But figuring out what controls what can be kind of complicated. 

(If you’re getting ready to lecture us on the difference between allostasis and homeostasis, go jump in a pond instead.)

Medicine is the applied study of one area of biology (i.e. human biology, for the most part), so it faces all the same problems biology does. The human body works to control all sorts of variables important to our survival, which is good. But if you look at a signal relevant to human health, and want to figure out what controls that signal, chances are it won’t be correlated with its causes. That’s… confusing. 

Lots of people forget that psychology is biological, but it obviously is. The brain is an organ too; it is made up of cells; it works by homeostatic principles. This is an under-appreciated perspective within psychology itself but some people are coming around; see for example this recent paper.

If you were to ask us what field our book A Chemical Hunger falls under, we would say cognitive science. Hunger is pretty clearly regulated in the brain as a cognitive-computational process and it’s pretty clearly part of a number of complicated homeostatic systems, systems that are controlling things like body weight and energy. So in a way, this is psychology too.

It’s important to remember that statistics was largely developed in fields like astronomy, demography, population genetics, and agriculture, which almost never deal with control systems. Correlation as you know it was introduced by Karl Pearson (incidentally, also a big racist; and worse, a Sorrows of Young Werther fan), whose work was wide-ranging but largely focused on genetic inheritance. While correlation was developed to understand things like barley yields, and can do that pretty well, it just wasn’t designed with control systems in mind. It may be unhelpful, or even misleading, if you point it at the wrong problem.

For a mathematical concept, correlation is not even that old, barely 140 years. So while correlation has captured the modern imagination, it’s not surprising that it isn’t always suited to scientific problems outside the ones it was invented to tackle.

2 thoughts on “Control and Correlation

  1. Anonymous says:

    In the DAG-based causal inference literature, this is known as “unfaithfulness” — where the structure of the model is such that there is in fact a causal pathway, but things line up just right for the effect to be cancelled out.

    There are methods for causal discovery that try to infer the causal graph structure by measuring lots of pairs of correlations, and slowly ruling out incompatible causal graphs — and assumptions of faithfulness are needed for these systems.

    I believe there are various arguments why faithfulness assumptions might be reasonable — for instance, if you assume you have a bunch of linear models whose coefficients are unknown samples from a normal distribution, then you have faithfulness with probability 1. And, come on, it’s not like nature is going to carefully pick values just to screw up your analysis…

    On the other hand, it seems like social scientists and economists tend to think faithfulness is a bad assumption to make, because obviously the people you’re studying will be trying to optimize their own decisions.

    Cosma Shalizi has a blog post with a simple worked out example (the thermostat also mentioned in this post).

    http://bactra.org/weblog/1178.html

    Here, as long as you can measure the control signal, and there is a little bit of noise, you can still end up getting everything right using standard techniques. I like his aphorism at the end: “Feedback is a mechanism for persistently violating faithfulness”.

    I don’t know nearly enough about causal inference, but here are some ill-informed and perhaps wrong thoughts:

    1) the notion of direct and indirect effect might be useful, I don’t understand these well, but know they’ve been studied — the effect of X on Y might be 0, but there are sometimes ways to decompose it into the direct effect, and the effect through mediators like the control system.

    2) If you can measure both outside signal (X) and control signal (C) and some outcome (Y), and you already know which is which, then adjusting for X blocks the backdoor path to Y, so you should get a correct estimate of the effect of C on Y.

    Like

  2. Hi there!

    Great stuff, as always.

    About hunger and correlation, how about considering hunger separately from the physiological need to eat? Then hunger would be the sensation that you feel, so more like a psychological sensation. I’m no specialist, just throwing this in there 🙂

    Btw, I’m actually an electronics engineer and correlation is a big thing, especially for analogic systems. We call the field automation and it’s usually analysed by measuring the internal state of the system, what signal(s) are fed back into it, usually with a delay and how they are combined. Simple example at: https://en.wikipedia.org/wiki/Negative-feedback_amplifier

    I hope that this helps.

    Kind regards.

    Pierre.

    On Tue, Mar 15, 2022 at 5:46 PM SLIME MOLD TIME MOLD wrote:

    > slimemoldtimemold posted: ” I. A thermostat is a simple example of a > control system. A basic model has only a few parts: some kind of sensor for > detecting the temperature within the house, and some way of changing the > temperature. Usually this means it has the ability to turn th” >

    Like

Leave a comment