A New Way to Detect Causality in Time-Series: Interview with Alejandro Rodriguez Dominguez

Microprediction
15 min readNov 1, 2024

I virtually sat down with Alejandro Rodriguez Dominguez to discuss his recent paper with Om Hari Yadav on causality detection. I’m always on the lookout for new tricks to sharpen the search for relevant exogenous data, and I was thinking of including Alejandro and Om Hari’s ideas in my entry in the ADIA Lab Causal Discovery Contest that recently concluded (see my remarks here).

It’s too late for that, but I really enjoyed hearing from Alejandro on the approach. I would think it might makse sense to at least glance at these two papers:

  • Geometric Spatial and Temporal Constraints in Dynamical Systems and Their Relation to Causal Interactions between Time Series (paper)
  • Causal Interactions Indicator Between Two Time Series Using Extreme Variations in First Eigenvalue of Lagged Correlation Matrices (paper)

before reading this post, but that’s up to you.

Peter: Alejandro, you’ve written an intriguing paper with Om Hari Yadav that caught my attention. You make the claim that (correct me if I’m wrong) measuring the standard deviation of the relative contribution of the first eigenvalue in 2 x 2 lagged covariance matrix can provide more statistical power than, say, Granger Causality tests. Did I get that right and can you perhaps explain it to us with more clarity?

Alejandro: To recap, the method consists of the following: Given two time-series random variables X and Y, the goal is to obtain an indicator that measures the probability of a causal interaction occurring from X to Y at each time t. To achieve this, the relative contribution of the first eigenvalue is calculated for different covariance matrices at time t, with X lagged multiple times t=t-τ up to a maximum lag value. The standard deviation of this relative contribution across different lags then provides an indicator at that specific time t, measuring the probability of an interaction occurring from X to Y. Critical values can be obtained from the Tracy-Widom distribution. Additionally, by monitoring the indicator as a time-series, one can observe extreme values of the indicator for a particular t or for a specific pair of cause-and-effect variables, which would indicate a high probability of a causal interaction.

To contrast with Granger Causality test:

  1. The method is based on the physical definition of an interaction (as will be discussed later). It is a method for detecting interactions between two time series, with causality added afterward through a temporal constraint by applying the time lag once the interaction is detected. (Yes, interactions can be non-causal as we will see later). Since it is strictly focused on interaction detection, it outperforms Granger in this regard. Granger’s approach, by contrast, examines the predictive power of causal variables, rather than determining whether if and when interactions occur.
  2. This method assumes that interactions are rare, uncommon events. Other causal approaches assume that variables interact continuously or nearly continuously, which is the case with the vast majority of methods and books published in finance, econometrics, and AI today. In this work, an interaction between two time series is considered a rare or uncommon event, forming part of a chain of many interconnected events from numerous different variables (many = almost infinite and unobservable). However, the interaction of interest, specifically between the two time-series, is a rare event within this causal chain. In the case of two variables and a single interaction, the direction of time alone (order of events) is sufficient to identify the interaction as causal. However, with more interactions and more variables, the problem becomes more complex. The key point is that this method falls within the framework of the statistics of uncommon or extreme events (Extreme Value Theory), as opposed to Granger’s common-event or normal statistics.

It can be concluded that this method has greater statistical power in estimating causal interactions between two time-series than Granger Causality test:

  1. Because it is purely focused on detecting interactions from the physical definition of an interaction, rather than from a predictive relationship between variables that is not necessarily interactive.
  2. Because Granger cannot guarantee the absence of bi-directionality in relationships between X and Y, while this method can, through the use of the direction of time by applying the lag. In fact, causality is a component incorporated into the analysis when the lag is introduced (as a temporal direction constraint).
  3. The problem of “correlation does not imply causation” does not apply here because the indicator does not use the correlation between the series to analyze causality at all, but rather as part of the methodology to physically and geometrically assess whether there is an interaction in a 2-dimensional space between the two series with a certain probability. Remember that the covariance matrix is used to estimate the probability of an interaction, and the constraint of causality is added afterward through the lag.
  4. Additionally, it is a method based on the local monitoring of uncommon or extreme events, which we believe represent better causal interactions between two time-series over a sample test. This contrasts with the Granger test, which assumes that interactions are very common (in continuous or discrete time) throughout the entire test sample of the two time-series. This is why we believe that the Granger test not only has less statistical power but is also less realistic in estimating causal interactions.

Peter: Before we get deep into the connection to physics, can you provide any further purely statistical intuition for why this diagnostic might be particularly effective?

Alejandro: It is effective because it separates the problem into two parts: Interactions and Causality. Interactions must occur in space as a function of time. If you think of two time-series as geometric objects on a 2D plane, you can transform it into a physical experiment where interactions are better understood from a statistical perspective (like the Coulomb Gas line). Causality can then be incorporated later through temporal constraints, such as the direction of time (time arrow or ordering of events). Another intuition is the variability of the explanatory power of the largest eigenvalue of a system of two variables, which can be understood as deviations from equilibrium, which obviously require an interaction. The greater the deviation from equilibrium, the higher the probability of interaction, and causality can be added afterward in the form of a temporal constraint with the lag.

The point of using Granger Causality as a comparison example in the paper is also to differentiate between the literature that assumes interactions are continuous or very common between variables (Granger, Causal Networks, Judea Pearl followers, etc.) and those who believe they are not. Realistically, in financial markets, we have limited observability of the entire chain of events, making it impossible to fully connect the causal chain, which causes continuous methods or those trying to extract common rules or do-calculus to fail in generalization, except for simplistic or spurious cases. Therefore, we wanted to avoid that path from the beginning and focus on at least finding one purely causal interaction, then start connecting points from there, instead of creating a network of unreal points and errors across an unobservable sea of truly deterministic information.

Peter: Do you think it would be possible to construct a generative model where the statistical power could be established analytically relative to Granger Causality (or other measures)?

Alejandro: Relating to the previous question, we believe that each problem requires a proper definition of causality. One might consider testing the hypothesis of whether X causes Y, or alternatively, one could explore what causes the dependence between X and Y. Both problems would require different definitions, geometries, and methods. Both would be useful — one for prediction and the other for diversification. We think that Granger causality is not the ideal common denominator, and that an ideal common denominator does not exist. The true common denominator is reality, such as the performance of models out-of-sample, for example in systematic investment strategies with causal signals.

Now, Granger is a good common denominator, even though we’ve seen it has limitations, for many other methods in the literature (which assume interactions are continuous or very common). For those, we believe it might be possible to build that generative model you mentioned to answer your question.

Peter: Any remarks on pre-processing the time-series?

Alejandro: We have no remarks; the method is very straightforward. Therefore, we understand that there may be datasets requiring some adaptability (high frequency practitioners, alternative datasets, textual data, image data, etc.). As can be applied to any time-series data, usual ML preprocessing techniques should be applied. Some things to consider: the maximum lag is a function of the data frequency, as the latter determines whether causal processes might have more or less delay, or lag, between the causal event and the effect event.

A bonus tip we offer is that if you plot the explanatory value relative to the lag for all ttt, you can study the distribution of the relative contribution of the first eigenvalue as a function of the lags. This allows you to check the causal lag pattern behavior between the variables for that dataset, or to identify what delay to expect between cause and effect, depending on various conditions of interest.

Peter: You write that “causality is defined as the requirement that two points in space-time cannot communicate with each other if they are separated by a spacelike distance (Rosenfelder, 1989). Therefore causality implies interactions.” Is there an ELI5 rephrasing?

Alejandro: If Peter and Alejandro are separated in space-time, no matter how much Peter or Alejandro travel forward or backward in time, there will be no cause-effect relationship between them unless the spatial distance between them is zero at some point in time (such as when shaking hands), as this would constitute an interaction.

Question: Gotcha, Thanks. Some readers would be interested in the physics and your complementary paper “Geometric Spatial and Temporal Constraints in Dynamical Systems and Their Relation to Causal Interactions between Time Series’. Let’s get deeper into the physics analogy.

You mention an isometry (I think) between a temporal model and a spatial one. The latter I understand as two variables representing positions of walls that squeeze an ideal gas, and where an increase in variability in this distance implies an increased chance of causal relationship. I buy it.

But the former system you have to explain to me. What is the nature of the interaction between the particles that pass close to each other?

Alejandro: Indeed, it is an isometry, where the top graph represents space-time since the walls are moving in time, while the other graph you are referring to is spatial, representing three different timestamps: before, in, and after the causal interaction. A portion of the two time-series is represented in a 2D plane as variables A and B and their temporal arrows of time tA and ​tB. This representation is used in physics and is called a temporal bridge. For example, here I show an image from another paper, (GitHub — rrtucci/mappa_mundi: Causal DAG Extraction from Text (DEFT)), in which they use an equivalent (a graphically modified) temporal bridge representation:

Figure: Bridges Span two DAGs (i,e movies). We consider two posibilities: bridges a and b cross or they don’t.(Source: GitHub — rrtucci/mappa_mundi)

In the case of our complementary paper, the figure related to the one above is:

This graph should not be confused with what we would see if we plotted the series with the value axis as the ordinates and the temporal axis as the abscissas. It is an abstract 2D plane to demonstrate that the constraint of temporal ordering, which gives rise to causality, is met in both cases of the isometry.

In this figure, it is assumed that on the left side, there is no interaction yet (it will occur soon), while on the right side, there are two cases: the top subfigure shows a causal interaction, and the bottom subfigure represents a timestamp after the causal interaction.

For the interaction to be causal, the temporal ordering criterion must be satisfied. The concept of time arrows is used to symbolize different trajectories over time for different variables, resulting in different orders of occurrence in their interactions depending on where their arrows intersect relative to their positions along each time arrow. What is certain is that if A causally affects B, A’s time arrow must intersect B’s time arrow ahead of B position in the arrow. Or, the event A that causes B must occur before B in the arrow of time of B, tB (example top right, A is first in the arrow of time, then is B). Also, the arrow of time of A, tA, intersects the one of B, tB, ahead of the position of B in its arrow so that the segment ‖L‖>0 is positive prior the interaction, in the interaction and after the interaction if the interaction is causal. If this were not the case, ‖L‖<0, (The reader can see that if tA intersects tB from behind of B in tB, ‖L‖<0), would imply the isometry with the Coulomb gas means the two walls are compressed to a point in which the walls cross each other and the Coulomb gas space becomes negative. In the Coulomb gas representation, a negative space would represent this constraint, while in the time-series representation, it would signify the reversal of temporal ordering, thus violating causality. So, the isometry preserves the ordering of time and causality in the two time-series representation.

Once the interaction occurs, in the next timestamp (bottom-right subfigure), A and B continue along their respective time arrows. The segment ‖L‖ remains positive for the interaction to be causal; otherwise, it would imply a reverse temporal ordering that breaks causality as mentioned before.

Peter. Okay so now remind us of why the Tracy-Widom distribution emerges from the Coulomb gas model? Is there any shortcut to grok why the Tracy-Widom distribution should be roughly what it is?

Alejandro: The Tracy-Widom distribution is commonly used to describe the positive extreme values of the largest eigenvalue max of random matrices. It is relevant for studying fluctuations in systems where the largest eigenvalue plays a critical role, particularly in the field of extreme value statistics (EVS). A notable application of the statistics of max​ is in assessing stability in dynamic systems like ecosystems, as first proposed by May in 1972. In his work, May used the Tracy-Widom distribution to examine whether the interactions within a system were strong enough to drive it out of equilibrium. This was one of the first physical applications of max​ statistics and pointed to the existence of a sharp phase transition in such systems.

Phase Transitions and the Coulomb Gas Model

The large deviation function of max​ in this context acts similarly to the free energy in statistical physics, specifically in a Coulomb gas model. In this model, a gas of charged particles interacts under both a repulsive Coulomb force (which pushes particles apart) and an external harmonic potential (which pulls them toward the origin). This competition leads to a stable equilibrium configuration. The critical point in this model corresponds to a third-order phase transition — a phase transition marked by a discontinuity in the third derivative of the free energy. This type of third-order transition is surprisingly common and has been observed in a range of contexts, including two-dimensional quantum chromodynamics (QCD) models.

Calculating Typical and Atypical Fluctuations

The Coulomb gas model provides insight into two types of fluctuations for max​:

  1. Typical fluctuations, where max=O(N-2/3), are small deviations near the equilibrium value.
  2. Atypical large fluctuations, where max=O(1), represent rare, extreme deviations.

These fluctuations can be analyzed through the Coulomb gas partition function, where the configuration of particles in one dimension (subject to harmonic confinement and a hard wall) simulates the eigenvalue distribution. The cumulative distribution function (CDF) of max can then be expressed as the ratio of two partition functions, which allows for an understanding of both typical and atypical fluctuations.

Analogy with Quantum Chromodynamics (QCD)

In two-dimensional U(N) lattice QCD models, a similar third-order phase transition was discovered, known as the Gross-Witten-Wadia transition. In QCD, the transition from strong to weak coupling mirrors the stable-to-unstable phase transition in May’s ecosystem model. In both cases, the transition reflects a shift in the stability of the system, dependent on a critical threshold — whether it’s a coupling constant in QCD or an interaction strength in the ecological model.

Implications of the Tracy-Widom Distribution

The Tracy-Widom distribution, particularly in May’s ecological model, captures the behavior of max​ as one moves across this critical threshold, where small fluctuations around the threshold characterize a transition in the stability of the system. This “crossover” behavior of max​ is indicative of the system moving between two phases: stable (low interaction strength) and unstable (high interaction strength). Similar crossover behaviors are observed in other ensembles, such as Gaussian, Wishart, and Cauchy, where large deviation principles apply.

In conclusion, the Tracy-Widom distribution provides a powerful statistical framework for analyzing phase transitions in systems where large fluctuations in the leading eigenvalue indicate critical changes in stability, applicable across fields from ecosystems to quantum field theory.

Peter: Now that we’ve discussed the physics, is there any way to understand the power of your new statistic from a physical perspective?

Alejandro: The indicator addresses the issue of causal interactions between two time-series from a physical perspective.

If we want to explain the indicator from a physical perspective, it serves as a probabilistic measure of a system breaking its equilibrium due to an interaction. By using two time series within the system, with one of them lagged, we are enforcing the interaction to be causal (from a May 1972 physical perspective). From the Coulomb gas perspective, it’s the same concept, but with the system undergoing a phase change due to a charge generated by the movement of walls that compress and expand it. Once again, the interaction is enforced to be causal by using two time series with lags in the isometric representation, as explained earlier.

The statistic has not yet been applied to a physical problem because the isometry was from a space-time representation (Coulomb gas) to a spatial one (the time-series interactions), where causality was then enforced by contraint with the lag. Therefore, in physical experiments, the lag was not needed. However, it could be used in physics for cases where one wants to analyze the interaction component separately from the temporal component. For example, if one wants to determine the optimal lag between cause and effect in a nuclear reaction or particles interactions (Or for example, how many hoscillations of the walls are needed for pase transition in the Coulomb gas experiment).

Peter: Second last question. The examples you give involve a relatively small number of candidate causal relationships. Does this scale to thousands or millions of variables potentially?

Alejandro: This question opens the door for a second paper. First, a sequential search approach could be implemented, where the method is applied iteratively to maximize the value of the indicator according to a set of rules. In this case, the search algorithm would differ in nature from the indicator itself, and any existing methods in the literature could be applied.

However, a more intriguing direction would be to explore the high-dimensional perspective of the proposed solution involving the two time-series. It would be interesting to investigate lattice Quantum Chromodynamics (QCD) (where QCD is the quantum theory of the strong interaction, while Quantum Electrodynamics (QED) is the quantum theory of the electromagnetic interaction). Specifically, low-energy QCD, which studies interactions among multiple quantum particles in relation to gauge theory, could provide valuable insights. Some aspects of this relate to the Tracy-Widom distribution in multiple dimensions, while other approaches warrant further exploration. Below, some images are shared. The first image shows a space-time lattice of protons and neutrons, which QCD researchers have tried to model using various methods, including the Tracy-Widom distribution. Below that, we see two more complex lattices where QCD examines interactions among multiple particles, with some researchers focusing their approach on high-dimensional random matrix theory. These are the areas we would like to explore further to generalize our approach.

Figure: Two-dimensional representation of the space-time lattice. The smallest length on the lattice is the lattice spacing a, and the protons (p) and neutrons (n) are placed on the lattice sites. (Source: Meissner, Ulf-G. (2014).)

Figure: Phase diagram of strongly interacting matter. Here, ρ denotes the density, with ρ N the density of nuclear matter, and T is the temperature. (Source: Meissner, Ulf-G. (2014).)

Figure: Theoretical diagram based on Lattice QCD simulations depicting expected quark-gluon phase transition. (Source: https://www.jicfus.jp/en/promotion/pr/mj/guido-cossu/)

Peter: And finally, the most important question for many readers: can you point us to Python or R examples?

Alejandro: Certainly, here is the GitHub repository where we have included code relevant to the paper and experiments.

References:

  1. GitHub — rrtucci/mappa_mundi ← Very interesting repo!
  2. Meissner, Ulf-G. (2014).
  3. R. May, Will a Large Complex System be Stable?, Nature 38, 413–414 (1972). nature
  4. Guido Cossu page and https://www.jicfus.jp/en/promotion/pr/mj/guido-cossu/

--

--

Microprediction
Microprediction

Written by Microprediction

Chief Data Scientist, A Hedge Fund

Responses (3)