# Ensembles¶

In statistical mechanics, only partial information of the system can be observed. From a macroscopic point of view, thermodynamic observables do not tell us about all the information of the microstates. The contimiuum in the phase space, aka, the phase space distributions, is used to represent the internal stuctures of the system. For equlibrium physics, the probability distribution of the microstates $$\rho(\{p_i\}, \{q_i\})$$ may be used to calculate the macroscopic observables $$\mathscr O$$, i.e., $$\langle \mathscr O \rangle (t) = \int \mathscr O (\{p_i\}, \{q_i\}) \rho(\{p_i\}, \{q_i\}) d\Omega$$. The question is how to obtain the probability density $$\rho(\{p_i\}, \{q_i\})$$, i.e., the probability density distribution of phases space points. In the language of statistics, we either invent a know-all theory which involves the whole population of microscopic states, or a good sampling method to sample some representative microstates.

In Boltzmann’s theory, this problem is solved by applying the least information assumption, i.e., equal a priori principle. Using this, Boltzmann found better ways of calculating the observable by deriving the probabilities of the single particle energy states. A distribution in Boltzmann’s theory is shown in the following table.

single particle energy number of microstates degeneracy
$$e_1$$ $$n_1$$ $$g_1$$
$$e_2$$ $$n_2$$ $$g_2$$
$$e_3$$ $$n_3$$ $$g_3$$
$$e_i$$ $$n_i$$ $$g_i$$

The most probable distribution, i.e., the distribution that appears the most frequently, shall be used to calculate the thermodynamic observables. With this, it has been assumed that the system is dynamically moving in the phase space but is close to most probable distribution most of the time. This is our sampling method and the most probable distribution is our statistical sample. It sounds too extreme to use one distribution as a statistical sample. However, the macroscopic observables barely fluctuates because the most probable distribution is a delta-like distribution.

However, single particle energy states are not clearly defined in systems with strong self-interactions. In such systems, probabilities of microstates on the different distributions of single particle energy states shall not be defined either.

This problem is solved by the ensemble methods. Instead of collecting statistical sampling on the different microstates in time dimension, we make virtual copies of the system, aka replicas, so that we sample among these virtual copies. The replicas are interacting with each other as well as the environment of the original lab system. The energies of the replicas are denoted as $$E_i$$. The number of replicas on each energy $$E_i$$ is denoted as $$n_i$$. There will be multiple ways of configuring the ensemble so that it satisfies our thermodynamic conditions. A distribution in ensemble theory is shown in the following table.

replica energy number of replcas degeneracy
$$E_1$$ $$N_1$$ $$G_1$$
$$E_2$$ $$N_2$$ $$G_2$$
$$E_3$$ $$N_3$$ $$G_3$$
$$E_i$$ $$N_i$$ $$G_i$$

The most probable distribution shall be used as a statistical sample to calculate the thermodynamic observables.

A compare Boltzmann theory and ensemble theory is shown in the following chart.

Code of the Chart

Here is the JSON being used to generate the charts on Plectica.com.

Ensemble Methods in Machine Learning

In the field of machine learning, ensemble methods have been quite popular. Ensemble learning empowers an ensemble of models and aggregate the ensemble predictions into one output.

For example, random forest method takes the data and make $$N$$ copies of it using sampling method such as Bootstrap Aggregating (Bagging) while each of these copies are used to train a decision tree model. The final result is the “average” of all the decision tree results.

Imagine that each row of the dataset describes a particle. The dataset is then describing a box of particles. The interactions between the particles are not stated but the interactions are the source of the patterns. Thus an ultimate machine learning algorithm should find a way to simulate the interactions though it is very hard in reality.

Random forest creates copies of orignal system thus creating an ensemble. It has been assumed that the sampling with replacement would create copies with similar macroscropic properties. Each copy is used to calculate a prediction.

In random forest, the distribution of all the predictions is not necessarily sharp. The solution is to take the average. By taking the average assumes that each decision tree is making the same contribution, or being “democratic”. To ensure the “democracy”, random subspace method, or feature bagging, is introduced. Feature bagging is the core of random forest method.

In ensemble theory of statistical physics, the macroscopic quantities are calculated using most probable distribution not average. This is justifiy by the sharp distribution of distributions.

However, one could imagine that the random forest tricks would be useful if ensemble method is treated on a small physical system.

## Ensemble¶

Gibbs’ idea of ensemble is to create replicas of the system under certain macroscopic conditions. For different macroscopic conditions, we then derive the observables differently.

Ensemble Theory and Ergodicity

Ergodicity expects that all possible states to occur at least once in some systems. In other words, ergodicity means the system can visit all possible states many times during a long time. On the other hand, not all systems are ergodic. For such non-ergodic systems, the concept of ensemble is problematic since an ensemble might not represent the physical process anymore.

Questions

1. Is ensemble average the same as time average?
2. Is the physical system ergodic? 1. Even complicated systems can exhibit almost exactly periodic behavior, one example of this is FPU experiment . 2. Even the system is ergodic, how can we make sure each state will occur with the same probability?

As for examples of non-ergodic systems, we prepare a box of absolutely smooth with balls collide with walls perpendicularly. Then this system can stay on some discrete points with same values of momentum components, no matter how many particles are involved.

Fig. 20 An example of non ergodic system.

Another case from Wikipedia :

Pros

1. Poincaré recurrence theorem proves that at least some systems will come back to a state that is very close the the initial state after a long but finite time.
2. Systems are often chaotic so it’s not really possible to have pictures like the first one in Cons.

Poincare Recurrence Theorem

There is a very interesting paper regarding the recurrence theorem.

Dyson, F. J., & Falk, H. (1992). Period of a Discrete Cat Mapping. The American Mathematical Monthly, 99(7), 603.

The Liouville density evolution is written as

$\frac{\partial}{\partial t} \rho = \{ H, \rho \},$

while the von Neumann equation is

$i\hbar \frac{\partial}{\partial t} \hat\rho = [\hat H, \hat\rho ].$

The two equations are combined into one unified form

$i \frac{\partial\rho}{\partial t} = \hat L \rho$

where $$\hat L$$ is the Liouville operator.

The thermodynamic observables are calculated using the probabilities of each state,

$\avg{O} = \mathrm{Tr} \rho O$

As long as $$\rho$$ is not changed, the detailed dynamics and kinematics of the microscopic configurations doesn’t contribute the dynamics of the macroscopic observables.

Why does Ensemble Theory Work?

In principle, only one trace in phase space represents the actual kinematics of one lab system. Why is statistical sampling and statistical average working?

In ensemble theory, what we calculated is the ensemble average. Since we are dealing with equilibrium, we need the time average because we only have one lab system and the lab system evolve in time. Thus time average is the desired result. Since ensemble theory assume the replicas to be in equilibrium, each replica is serving as a snapshot of the system at different time. In this sense, ensemble average should be the same as time average.

## Equilibrium¶

Equilibrium means that

$\frac{\partial}{\partial} \rho = 0$

or equivalently,

$\{ H, \rho \} =0$

Applying the knowledge of quantum mechanics, one possible solution is

$\rho \propto e^{-\beta H}$

## The Three Most Used Ensembles¶

Table 1 Three Most Used Ensembles and Systems
Systems Ensembles Geometry in Phase space Key Variables
Isolated Microcanonical Shell; $$\rho = c'\delta(E-H)$$ Energy $$E$$
Weak interacting Canonical
Exchange particles Grand canonical

## Isolated System - Micro-canonical Ensemble¶

The system stays on the energy shell in phase space,

$\rho(p;q;0) = \delta(H(p;q;0) - E)$

The energy is also constant,

$H(p;q;t) = E$

For ergodic systems, ensemble average is equal to time average.

Important

What if the system stays longer in some regions on the phase sparce shell? How can we make sure the ensemble average is the same as time average?

Microcanonical ensembles are for isolated systems.

$\rho \propto \frac{1}{\text{No. of states on the ensemble surface}} \equiv \frac{1}{\Omega (E)}$

To calculate the entropy, the famous relation by Boltzmann is applied

$S = k_B \ln \Omega$

## Canonical Ensemble¶

For a system weakly interacting with a heat bath, the total energy of the system is

$E_T = E_S + E_R + E _{S,R}$

where the interacting energy $$E_{S,R}$$ is very small compared to the system energy, $$E_{S,R}\ll E_{S}$$. We drop this interaction energy term, so that

$E_T = E_S + E _R$

A simple and intuitive derivation of the probability density is to use the theory of independent events.

1. $$\rho_T d\Omega_T$$: probability of states in phase space volume $$d\Omega_T$$;
2. $$\rho_S d \Omega_S$$: probability of states in phase space volume $$d\Omega_S$$;
3. $$\rho_R d \Omega_R$$: probability of states in phase space volume $$d\Omega_R$$;

We have assumed weak interactions between the system and the reservoir, so (approximately) the probability in the system phase space and in reservoir phase space are independent of each other,

$\rho _ T d\Omega_T = \rho _S d\Omega_S \cdot \rho _R d \Omega_R .$

Since there is no particle exchange between the two systems, overall phase space volume is the system phase space volume multiplied by reservoir phase space volume,

$d\Omega_T = d\Omega _S \cdot d\Omega_R .$

Obviously we can get the relationship between the three probability densities,

$\rho_T = \rho_R \rho_S .$

Take the logarithm of $$\rho_T$$,

$\ln\rho_T = \ln\rho_R + \ln\rho_S .$

Key: $$\rho$$ is a function of energy $$E$$. AND both $$\rho$$ and energy are extensive. The only possible form of $$\ln \rho$$ is linear.

Finally we get,

$\ln \rho = - \alpha - \beta E$

i.e.,

$\rho = e^{-\alpha} e^{-\beta E}$

which is called canonical distribution.

Warning

This is not an rigorous derivation. Read RuKeng Su’s book for a more detailed and rigorous derivation.

## Grand Canonical Ensemble¶

Systems with changing particle number are described by grand canonical ensemble.

Here we only write donn the partition function of grand canonical ensemble

$Z = \sum _ n \sum_N e^{-\beta E_n - \mu N} = \sum_N \left( \sum _ n e^{- \beta E _ n} \right)\left(e^{-\mu}\right)^N = \sum _ N Z \left(Z^f\right)^N$

## Identical Particles¶

If a system consists of $$N$$ indentical particles, for any state $$n_i^\xi$$ particles in particle state i, the energy of the system on state $$\xi$$ is

$E^\xi = \sum_i \epsilon_i n_i^\xi$

where the summation is over all possible states of particles.

The energy is given by ensemble average

$\avg{E} = \frac{\sum _\xi e^{-\beta E^\xi} E^\xi}{\sum _\xi e^{-\beta E^\xi}}$

where $$\xi$$ is the ensemble summation.

## The Three Ensembles Continued¶

The three ensembles are the same when particle number N of the system becomes large enough, $$N\rightarrow 0$$.

Here are the arguments:

1. When N is large enough, the interactions between the system and the reservoir are negligible. The extreme energies are rarely observed so that the distributions approach the Gaussian distribution. The variance of Gaussian distribution is proportional to $$1/\sqrt{N}$$.
2. We have $$dE_S+dE_R=dE$$ and $$dE=0$$. When the energy of the system increases, that of the reservoir drops.

## Most Probable Distribution¶

Boltzmann’s theory is about the most probable distribution of the single particle energy states.

1. For classical distinguishable particles, $$a_l = w_l e^{-\alpha -\beta e_l}$$;
2. For bosons, $$a_l = w_l \frac{1}{e^{\alpha+\beta e_l} - 1}$$;
3. For fermions, $$a_l = w_l \frac{1}{e^{\alpha + \beta e_l} + 1}$$.

This figure shows that the three lines converge when the factor $$\alpha + \beta e_l$$ is large enough. Fermions have less microstates than classical particles because of the Pauli exclusive principle.

$$\alpha + \beta e_l$$ being large corresponds to several different physical meanings.

1. Temperature is low;
2. Energy is high;
3. Chemical coupling coefficient $$\alpha$$ is large.

We have several identical conditions for the three distribution to be the same.

$\alpha + \beta e_l \gg 1 \Leftrightarrow \alpha \gg 1 \Leftrightarrow 1/\exp(\alpha + \beta e_l) \ll 1 \Leftrightarrow a_l / w_l \ll 1$

where the last statement is quite interesting. $$a_l/w_l \ll 1$$ means we have much more states then particles and the quantum effects becomes very small.

Warning

One should be careful that even when the above conditions are satisfied, the number of micro states for classical particles is very different from quantum particles,

$\Omega_{B.E} = \Omega_{F.D.} = \Omega_{M.B.}/N! .$

This will have effects on entropy eventually.

Recall that thermal wavelength $$\lambda_t$$ is a useful method of analyzing the quantum effect. At high temperature, thermal wavelength becomes small and the system is more classical.

Hint

1. Massive particles $$\lambda_t = \frac{h}{p} = \frac{h}{\sqrt{2m K}} = \frac{h}{\sqrt{ 2\pi m k T }}$$
2. Massless particles $$\lambda_t = \frac{c h}{2\pi^{1/3} k T}$$

However, at high temperature, the three micro states numbers are going to be very different. This is because thermal wavelength consider the movement of particles and high temperature means large momentum thus classical. The number of micro states comes from a discussion of occupation of states.

Important

What’s the difference between ensemble probability density and most probable distribution? What makes the +1 or -1 in the occupation number?

Most probable distribution is the method used in Boltzmann’s theory while ensemble probability density is in ensemble theory. That means in ensemble theory all copies (states) in a canonical ensemble appear with a probability density $$\exp(-\beta E)$$ and all information about the type of particles is in Hamiltonian.

Being different from ensemble theory, Boltzmann’s theory deals with number of micro states which is affected by the type of particles. Suppose we have N particles in a system and occupying $$e_l$$ energy levels with a number of $$a_l$$ particles. Note that we have a degeneration of $$w_l$$ on each energy levels.

(Gliffy source here .)

For Boltzmann’s theory, we need to

1. Calculate the number of micro states of the system;
2. Calculate most probable distribution using Lagrange multipliers;
3. Calculate the average of an observable using the most probable distribution.

Calculation of number of micro states

Calculation of the number of micro states requires some basic knowledge of different types of particles.

For classical particles, we can distinguish each particle from others and no restrictions on the number of states on each energy level. Now we have $$w_l^{a_l}$$ possible states for each energy level. Since the particles are distinguishable we can have $$N!$$ possible ways of exchanging particles. But the degenerated particles won’t contribute to the exchange (for they are the same and not distinguishable) which is calculated by $$\Pi_l a_l!$$.

Finally we have

$\Omega _{M.B.} = \frac{N!}{\Pi_l a_l !} \Pi_l w_l^{a_l}$

as the number of possible states.

With similar techniques which is explicitly explained on Wang’s book, we get the number of micro states for the other two types of particles.

$\Omega _{B.E.} = \Pi_l \frac{(a_l+w_l-1)}{a_l!(w_l -1)!}$

is the number of micro states for a Boson system with a $$\{a_l\}$$ distribution.

$\Omega _ {F.D.} = \Pi _ {l} C_{w_l}^{a_l} = \Pi _ l \frac{w_l!}{a_l!(w_l - a_l)!}$

is the number of micro states for a Fermion system with a distribution $$\{a_l\}$$. We get this because we just need to pick out $$a_l$$ states for $$a_l$$ particles on the energy level.

## The Exact Partition Function¶

DoS and partition function have already been discussed in previous notes.

## Is There A Gap Between Fermion and Boson?¶

Suppose we know only M.B. distribution, by applying this to harmonic oscillators we can find that

$\avg{H} = (\bar n + 1/2)\hbar \omega$

where $$\bar n$$ is given by

$\bar n = \frac{1}{e^{\beta \hbar \omega} - 1}$

which clearly indicates a new type of Boson particles.

So classical statistical mechanics and quantum statistical mechanics are closely connected not only in the most micro state numbers but also in a more fundamental way.

Hint

Note that this is possible because energy differences between energy levels are the same for arbitrary adjacent energy levels. Adding one new imagined particle with energy $$\hbar\omega$$ is equivalence to excite one particle to higher energy levels. So we can treat the imagined particle as a Boson particle.

| Created with Sphinx and . | | | |