|
Notation. The Latin indices such as
,
,
, etc.,
typically span the three spatial coordinate labels, commonly denoted as
1, 2, 3. On the other hand, Greek indices like
,
, and
so forth, usually range over the four spacetime coordinate labels,
specifically 1, 2, 3, 0, where
represents the
time coordinate. Indices that appear twice are usually summed unless
specified otherwise. The spacetime metric, denoted by
, is a diagonal matrix with elements
and
. The
d'Alembertian is represented as
and defined by
the equation
, where
is the Laplacian given by
. The ‘ Levi-Civita tensor,’ symbolized
by
, is an entirely
antisymmetric entity with
.
Spatial vectors in three dimensions are marked by boldface characters. A
unit vector corresponding to any vector is shown with a hat, as in
. A dot over a variable signifies
its time derivative. The Dirac matrices
adhere
to
, and
, while
.
The step function
yields a value of +1 when
and 0 when
.
For a matrix or vector
, the
complex conjugate, transpose, and Hermitian adjoint are represented by
,
, and
respectively. The
Hermitian adjoint of an operator
is marked as
, except when an asterisk
emphasizes that a vector or matrix of operators is not transposed. Terms
like +H.c. or +c.c. appended to equations indicate the addition of the
Hermitian adjoint or complex conjugate of preceding terms. A Dirac
spinor
with a bar over it is defined as
. Apart from in Chapter ?,
units are normalized such that
and the speed of
light are set to one. The fine structure constant is represented as
, calculated as
, approximately
,
where
is the rationalized charge of the
electron. Parenthetical numbers next to quoted numerical figures signify
the uncertainty in the last digits. Unless otherwise stated,
experimental data is sourced from ‘Review of Particle
Properties,’ Phys. Rev. D50, 1173 (1994).
The perspective presented argues that quantum field theory exists in its current form due to its unique capability to harmonize quantum mechanics with special relativity, under some conditions. Our initial endeavor is to explore how symmetries, such as Lorentz invariance, manifest within a quantum context in the following aspects.
Quantum field theory rests on the same foundational quantum mechanics developed by Schrödinger, Heisenberg, Pauli, Born, and other pioneers in 1925–1926.
Physical states are represented by rays in complex Hilbert space
(the inner product is denoted be
with the
first slot antilinear (conjugate-linear) and the second slot
linear). Here, a ray is a set of normalized vectors (i.e.
) with
and
belonging to the same ray if
, where
is an arbitrary complex number with
.
Observables are represented by Hermitian operators. A state
represented by a ray
has a definite value
for the observable represented by an
operator
if vectors
belonging to this ray are eigenvectors of
with eigenvalue
:
If a system is in a state represented by a ray
, and an experiment is done to test whether
it is in any one of the different states represented by mutually
orthogonal rays
(for instance, by measuring
one or more observables) then the probability of finding it in the
state represented by
is
where
and
are any
vectors belongs to rays
and
, respectively.
A symmetry transformation can be thought of as a shift in perspective
that does not affect the outcomes of potential experiments. If an
observer
perceives a system in a state denoted
by a ray
or
or
..., a corresponding observer
scrutinizing the same system would view it in a different state,
symbolized by a ray
or
or
..., respectively. However, both
observers must ascertain the same probabilities:
![]() |
(1.2.1) |
This condition is necessary but not sufficient for a ray
transformation to qualify as a symmetry; additional conditions will be
elaborated upon in the following chapter. Wigner proved a significant
theorem in the early 1930s, stating that for any such transformation
, an operator
can be defined in the Hilbert space. If
is a vector in ray
, then
belongs to ray
.
The operator
can either be unitary and linear:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
or antiunitary and antilinear:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
for all
in the Hilbert space.
This finding is called the fundamental theorem of Wigner and the proof is the following:
The fundamental theorem of Wigner (1931).
Let
be a bijection satisying
be a Hilbert space and let
for all rays
for all ray
and
;
and vectors
,
,
,
and
. Then there exists an
operator
acting on
such that
and all
; and that either
is
unitary and linear or antiunitary and antilinear.
Proof.
As previously stated, the adjoint of a linear operator
is determined by
![]() |
(1.2.6) |
This definition does not apply to an antilinear operator since the
right-hand side of (1.2.6) would be linear in
, while the left-hand side is antilinear in
. For an antilinear operator
, the adjoint is instead
specified as:
![]() |
(1.2.7) |
Given this definition, the criteria for either unitarity or antiunitarity can both be expressed as:
![]() |
(1.2.8) |
There exists a trivial symmetry transformation ℛ→ℛ,
represented by the identity operator
.
This operator is naturally unitary and linear. Continuity dictates that
any symmetry operation (like a rotation, translation, or Lorentz
transformation) that can be reduced to a trivial transformation by
continuously adjusting certain parameters (such as angles, distances, or
velocities) must be characterized by a linear unitary operator
, as opposed to one that is
antilinear and antiunitary. (Symmetries represented by antiunitary
antilinear operators are less common in physics; they all entail a
reversal in the direction of time flow. See Section ? for
more details.)
Specifically, a symmetry transformation that is nearly trivial on an infinitesimal scale can be depicted by a linear unitary operator that is infinitesimally close to the identity operator:
![]() |
(1.2.9) |
Here,
is a real infinitesimal. For
to be both unitary and linear,
needs to be Hermitian and linear, making it a potential observable. In
fact, many (if not all) physical observables, like angular momentum or
momentum, are derived from symmetry transformations in this manner.
The set of symmetry transformations possesses specific characteristics
that categorize it as a group. If
is a
transformation converting rays
to
, and
is another
transformation that maps
to
, then the outcome of executing both
transformations consecutively is yet another symmetry transformation,
denoted as
, that transforms
into
.
Additionally, any symmetry transformation
that
changes
into
has an
inverse, expressed as
, which
reverts
back to
.
Moreover, there exists an identity transformation,
, which leaves rays unaltered.
The unitary or antiunitary operators
that
correspond to these symmetry transformations emulate this group
structure, albeit with added complexity because
operators act on vectors in Hilbert space instead of on rays. If
transforms
into
, then applying
to a
vector
in
must result in
a vector
in
.
If
then maps this ray to
,
must also belong to
, as must
. Therefore, the vectors can only differ by a phase
factor
, as given by:
![]() |
(1.2.10) |
Moreover, barring a notable exception, the linearity (or antilinearity)
of
specifies that these phases are
state-independent. To prove this, let us consider two non-proportional
vectors
and
and apply
Equation (1.2.10) to the state:
Every unitary or antiunitary operator has an inverse (its adjoint),
which is also either unitary or antiunitary. Upon left-multiplying
Equation (1.2.11) by
,
we arrive at:
![]() |
(1.2.12) |
As
and
are linearly
independent, it follows that
![]() |
(1.2.13) |
Consequently, the phase in Equation (1.2.10) is state-independent, leading to the operator relation:
![]() |
(1.2.14) |
When
, this indicates that
constitutes a representation of the group of
symmetry transformations. For arbitrary phases
, we refer to it as a ‘projective
representation’ or a representation ‘up to a phase’.
Whether the Lie group structure allows for state vectors to furnish an
ordinary or projective representation can not be inferred from the group
structure alone but will become apparent later.
The exception to the reasoning that concluded in Equation (1.2.14)
lies in the possibility that the system may not be preparable in a state
represented by
. For example,
it is generally considered unfeasible to prepare a system in a
superposition of states with total angular momenta that are integers and
half-integers. In such scenarios, we refer to the presence of a
‘superselection rule’ between different categories of
states. As a result, the phases
could be
contingent on which class of states the operators
and
are acting upon. Further details about these
phases and projective representations will be discussed in Section ?. It will be shown that any symmetry group featuring
projective representations can be extended (without altering its
physical meaning) to allow for all its representations to be
non-projective, i.e., with
.
Until we reach Section ?, we will proceed with the
assumption that such an extension has been applied, and will take
in (1.2.14). Also, the existence of
spinor is partially derived from the phase ambiguity that arises when
taking absolute values and the fact that the homotopy class of the
homogeneous Lorentz group.
In physics, a specific type of group known as a connected Lie group
holds special significance. These are groups comprised of
transformations
, defined by
a finite collection of real, continuous parameters, symbolized as
. Each group element is linked to
the identity element through a continuous path within the group itself.
The multiplication rule for the group is expressed as
![]() |
(1.2.15) |
where
is a function of both
and
. If
denotes the coordinates of the identity, then
![]() |
(1.2.16) |
must hold true. In the case of such continuous groups, the
transformations must be represented in the physical Hilbert space by
unitary operators
, rather
than antiunitary ones. These unitary operators, at least in a finite
vicinity of the identity, can be expressed by a power series as
![]() |
(1.2.17) |
Here,
, and so on, are
Hermitian operators independent of
.
Assuming that
provides a standard
(non-projective) representation of the transformation group, meaning
![]() |
(1.2.18) |
we can expand this in terms of
and
. In accordance with Equation (1.2.16),
the second-order expansion of
should be
![]() |
(1.2.19) |
Here,
are real coefficients. Note that the
presence of any
or
terms
would be in conflict with Equation (1.2.16). Following
this, Equation (1.2.18) can be articulated as:
On both sides of Equation (1.2.20), terms of order 1,
,
,
, and
correspond without issue. However, when focusing on the
terms, a non-trivial condition emerges:
![]() |
(1.2.21) |
This reveals that if we know the group structure, specifically the
function
and its corresponding quadratic
coefficient
, we can
determine the second-order terms of
using the
first-order generators
.
However, there's a requirement for consistency: the operator
has to be symmetric in
and
, as it's the second derivative of
with respect to
and
. Therefore, Equation (1.2.21) necessitates that
![]() |
(1.2.22) |
where
are a set of real constants termed as
structure constants, defined by
![]() |
(1.2.23) |
This kind of commutation relationship is termed a Lie algebra. In a
later section, we will essentially demonstrate that this commutation
relation (1.2.22) is the sole condition needed to
perpetuate this computation. In other words, the complete power series
for
can be generated from an endless chain of
equations like Equation (1.2.21), as long as we are aware
of the first-order terms, namely the generators
. While this does not mean
operators are uniquely identified for all
based
solely on
, it does signify
that they are uniquely specified within a finite vicinity of the
identity coordinate
, such
that Equation (1.2.15) holds true if
and
lie within this region. The discussion about
extending this to all
will take place in a
subsequent section.
There is a particular scenario of considerable relevance that will recur
frequently in our discussions. Assume the function
is simply additive for some or all of the coordinates
, as expressed by:
![]() |
(1.2.24) |
This situation is applicable, for example, in the context of spacetime
translations or for rotations about a single fixed axis (but not for
both simultaneously). In this special case, the coefficients
from Equation (1.2.19) become zero, and
likewise, the structure constants in Equation (1.2.23) also
vanish. Consequently, the generators are commutative, denoted by:
![]() |
(1.2.25) |
Such a group is termed as Abelian. Under these conditions, computing
for all
becomes
straightforward. According to Equations (1.2.18) and (1.2.24), for any integer
,
we can express:
By taking the limit as
approaches infinity and
retaining only the first-order term in
,
we obtain:
and consequently,
![]() |
(1.2.26) |
Einstein's principle of relativity asserts the equivalence of specific
'inertial' frames of reference, setting it apart from the Galilean
principle of relativity adhered to by Newtonian mechanics. The
distinction comes from the transformation equations that link coordinate
systems across different inertial frames. Given that
represents the coordinates in one inertial frame—where
are Cartesian spatial coordinates and
is a time coordinate (assuming the speed of light equals one)—the
coordinates
in another inertial frame must
satisfy:
![]() |
(1.3.1) |
or, alternatively,
![]() |
(1.3.2) |
In these equations,
is a diagonal matrix with
elements defined as:
![]() |
(1.3.3) |
The summation convention applies: any index like
and
in Equation (1.3.2) appearing
twice, once as a superscript and once as a subscript, is summed over.
These transformations have the unique feature that the speed of light
remains consistent—in our chosen units, equal to one—across
all inertial frames. A light wave with unit speed satisfies
, or in terms of the equation
, which also implies
and thus
.
Any coordinate transformation
fulfilling Eq. (1.3.2) is linear, as denoted by:
![]() |
(1.3.4) |
Here,
are arbitrary constants, and
is a constant matrix that meets the criteria:
![]() |
(1.3.5) |
For certain applications, it's advantageous to express the Lorentz
transformation condition using an alternate formulation. The matrix
possesses an inverse, designated as
, which coincidentally has the same diagonal
components:
and
.
By judiciously inserting parentheses and multiplying Eq. (1.3.5)
by
, we get:
Further multiplying by the inverse of the matrix
yields:
![]() |
(1.3.6) |
These transformations constitute a group. When we initially apply a
Lorentz transformation as per Eq. (1.3.4), and then follow
it with another Lorentz transformation
,
such that
we find that the overall transformation effect is identical to
performing a Lorentz transformation
as described
by
![]() |
(1.3.7) |
Here, it's worth noting that if
and
both meet the conditions of Eq. (1.3.5),
will also be a Lorentz transformation. The bar
notation is simply used to distinguish one Lorentz transformation from
another. Correspondingly, the transformations
on
physical states obey the composition law
![]() |
(1.3.8) |
Calculating the determinant of Eq. (1.3.5), we arrive at
![]() |
(1.3.9) |
This implies that
has an inverse, denoted as
, which as per Eq. (1.3.5)
takes the form
![]() |
(1.3.10) |
According to Eq. (1.3.8), the inverse of the transformation
turns out to be
,
and naturally, the identity transformation is represented by
.
Based on the dialogue in the prior section, the transformations
give rise to a unitary linear transformation acting
on vectors in the physical Hilbert space, represented as
. These operators
obey
a composition law articulated as
![]() |
(1.3.11) |
It's worth noting that to prevent the emergence of a phase factor on the right-hand side of Eq. (1.3.11), it's generally required to extend the Lorentz group. The suitable extension for accomplishing this is discussed in Section ?.
The complete set of transformations
is formally
referred to as the inhomogeneous Lorentz group, also known as the
Poincaré group. This group has several significant subgroups.
First, transformations with
naturally constitute
a subgroup, described by
![]() |
(1.3.12) |
which is termed the homogeneous Lorentz group. Additionally, from Eq.
(1.3.9), it's evident that
can be
either
or
;
transformations having
inherently make up a
subgroup of either the homogeneous or inhomogeneous Lorentz groups.
Further scrutiny of the 00-components of Eqs. (1.3.5) and
(1.3.6) yields
![]() |
(1.3.13) |
where
ranges over 1, 2, and 3. This shows that
either
or
.
Transformations where
constitute a subgroup.
Observe that if
and
are
two such matrices
, then
According to Eq. (1.3.13), the three-vector
has a length of
,
and similarly, the three-vector
has a length of
. Therefore, the scalar
product of these two three-vectors has an upper limit given by
![]() |
(1.3.14) |
leading to
This subgroup, characterized by
and
, is identified as the proper orthochronous
Lorentz group. As one cannot smoothly transition from
to
, or from
to
, any Lorentz
transformation derived from the identity through a continuous variation
of parameters must share the same sign for
and
as the identity, and thus must be a member of
the proper orthochronous Lorentz group.
Every Lorentz transformation falls into one of two categories: it is
either proper and orthochronous, or it can be expressed as the
composition of an element from the proper orthochronous Lorentz group
and one of the discrete transformations
or
or
. Here,
represents the space inversion, which has
non-zero elements given by
![]() |
(1.3.15) |
while
stands for the time-reversal matrix, with
non-zero elements defined as
![]() |
(1.3.16) |
Therefore, a comprehensive understanding of the entire Lorentz group can be achieved by studying its proper orthochronous subgroup, along with the concepts of space inversion and time-reversal. The exploration of space inversion and time-reversal will be carried out separately in Section ?. Until that point, our focus will remain on either the homogeneous or inhomogeneous proper orthochronous Lorentz group.
As discussed in Section 1.2, many essential attributes of
any Lie symmetry group are encapsulated in the properties of the
elements in the vicinity of the identity element. In the context of the
inhomogeneous Lorentz group, the identity transformation is given by
and
.
Therefore, we aim to explore transformations that can be written as
![]() |
(1.4.1) |
where both
and
are
infinitesimal. The Lorentz condition, expressed as equation (1.3.5),
can be rewritten as
In this book, we adopt the convention that indices can be raised or
lowered by contracting with
or
:
If we retain only the first-order terms in
in
the Lorentz condition (1.3.5), we find that this condition
simplifies to the antisymmetry of
:
![]() |
(1.4.2) |
An antisymmetric second-rank tensor in four dimensions has
independent components. Coupled with the four components
of
, an inhomogeneous Lorentz
transformation is thus characterized by
parameters.
Because
maps any ray onto itself, it must be
proportional to the unit operator, and by a choice of phase may be made
equal to it. Excluding the presence of superselection rules, we can
eliminate the chance that this proportionality factor varies depending
on the state acted upon by
.
This exclusion follows the same logic we applied in Section 1.2
to dismiss the idea that phases in projective representations of
symmetry groups might depend on the states they act upon. In cases where
superselection rules are relevant, it could be necessary to adjust the
phase factors of
depending on the sector it acts
on.
For an infinitesimal Lorentz transformation as described by equation (1.4.1),
must be equal to the unit
operator
augmented by terms that are linear in
and
.
We express this relationship as
![]() |
(1.4.3) |
In this equation,
and
are operators that are independent of
and
, and the ellipsis signifies terms
of higher order in
and/or
. For
to be unitary,
operators
and
must be
Hermitian:
![]() |
(1.4.4) |
(Yes, the generators of boosts are observables.) Given that
is antisymmetric, its coefficient
can also be taken to be antisymmetric:
![]() |
(1.4.5) |
As we will elaborate on later,
,
and
are the components of the momentum
operators;
, and
are the angular momentum vector components; and
is the energy operator or Hamiltonian. These
identifications of angular-momentum generators are necessitated by the
commutation relations of
.
However, the commutation relations don't prescribe a definite sign for
and
,
making the sign choice for the
term in equation
(1.4.3) a matter of convention. The alignment of this
choice with the standard definition of the Hamiltonian
will be clarified in Section ?.
We turn our attention to the Lorentz transformation characteristics of
and
.
We focus on the composite expression
where
and
are parameters
of a new transformation, distinct from
and
. According to Equation (1.3.11),
the operation
results in
, signifying that
serves as
the inverse of
.
Consequently, from (1.3.11), we obtain:
![]() |
(1.4.6) |
To the first order in
and
, this leads to:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
By matching the coefficients of
and
on both sides of the equation and employing (1.3.10),
we arrive at:
In the case of homogeneous Lorentz transformations where
, these transformation laws simply indicate
that
behaves as a tensor and
as a vector. For pure translations, where
,
these rules convey that
remains invariant under
translation, while
does not. Specifically, the
alteration in the spatial components of
due to a
spatial translation corresponds to the conventional change in angular
momentum when the point of reference for measuring angular momentum is
shifted.
Next, we consider the application of rules (1.4.8) and (1.4.9) to an infinitesimal transformation. Specifically, we
take
and
,
where the infinitesimals
and
are not related to the earlier
and
. Utilizing Equation (1.4.3) and
retaining only first-order terms in
and
, Equations (1.4.8)
and (1.4.9) simplify to:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
By isolating the coefficients of
and
on both sides of these equations, we derive the
commutation relations:
These equations define the Lie algebra of the Poincaré group.
In quantum mechanics, particular importance is given to those operators
that are conserved, meaning they commute with the energy operator
. A review of Equations (1.4.13)
and (1.4.14) reveals that these conserved operators include
the momentum three-vector
![]() |
(1.4.15) |
and the angular-momentum three-vector
![]() |
(1.4.16) |
as well as the energy
itself. The other
generators constitute what is termed the 'boost' three-vector
![]() |
(1.4.17) |
These are not conserved, which is why their eigenvalues are not employed to characterize physical states. Expressed in a three-dimensional notation, the commutation relations (1.4.12), (1.4.13), and (1.4.14) can be represented as:
Here,
take the values 1, 2, and 3, and
is the completely antisymmetric quantity where
. The commutation relation (1.4.18) is identified as belonging to the angular-momentum
operator.
Let us prove (1.4.22) and (1.4.24). From
, (1.4.13), (1.4.15),
and (1.4.17), we have
The subgroup of pure translations
is a part of
the inhomogeneous Lorentz group, and its group multiplication rule, as
defined by (1.3.7), is
![]() |
(1.4.25) |
This multiplication rule is additive, similar to what is described in Equation (1.2.24). Employing Equation (1.4.3) and revisiting the arguments that led to Equation (1.2.26), we determine that finite translations in the physical Hilbert space are represented as
![]() |
(1.4.26) |
Likewise, a rotation
through an angle
around the direction specified by
is represented in the physical Hilbert space as
![]() |
(1.4.27) |
Contrasting the Poincaré algebra with the Lie algebra of the
Galilean group, the symmetry group for Newtonian mechanics, offers
fascinating insights. While it is possible to derive the Galilean
algebra beginning with its transformation rules and using the same
methodology we used for the Poincaré algebra, a simpler path
exists. Since we already possess Eqs. (1.4.18)-(1.4.24),
we can more conveniently obtain the Galilean algebra as the
Inönü-Wigner contraction of the Poincaré algebra in the
low-velocity limit. For a set of particles with an average mass
and velocity
,
we anticipate the momentum and angular-momentum operators to be of the
order
,
. On the flip side, the energy operator
is composed of a total mass
and a non-mass energy
(kinetic and potential),
which are of the order
,
. Examining Eqs. (1.4.18)-(1.4.24) reveals that in the limit where
, the commutation relations simplify to:
where
scales as
.
It's noteworthy that in Hilbert space, the sequence of operations
involving a translation
and a 'boost'
does not yield the expected transformation
. Instead, we have:
The emergence of the phase factor
indicates that
we are dealing with a projective representation, which comes with a
superselection rule that precludes the mixing of states with different
masses. In this aspect, the mathematical framework of the
Poincaré group is less complex than that of the Galilean group.
Nonetheless, it is entirely feasible to extend the Galilean group
formally by introducing an additional generator to its Lie algebra. This
new generator would commute with all existing generators and have
eigenvalues corresponding to the masses of the different states. In such
a scenario, physical states would be represented through an ordinary,
rather than projective, representation of the augmented symmetry group.
While this might seem like a minor change in notation, it effectively
eliminates the necessity for a mass superselection rule within the
reinterpreted Galilean group.
We turn our attention to the categorization of single-particle states based on their transformation properties under the inhomogeneous Lorentz group.
Given that the components of the energy-momentum four-vector commute
among themselves, it is logical to represent physical state-vectors
using eigenvectors of the four-momentum. To do this, we introduce a
label
to account for any additional degrees of
freedom, leading us to consider state-vectors
such that
![]() |
(1.5.1) |
For more complex states, like those comprising multiple free particles,
the label
would need to accommodate both
continuous and discrete values. In this discussion, we are focusing
solely on one-particle states, whose definition includes that the label
is purely discrete. It is worth noting that
specific bound states of two or more particles, like the ground state of
a hydrogen atom, are also considered one-particle states in this
context. While such states are not elementary particles, the distinction
between composite and elementary particles is irrelevant for our current
purposes.
Equations (1.5.1) and (1.4.26) inform us about the transformation behavior of these states under homogeneous Lorentz transformations.
Applying equation (1.4.9), we find that when a quantum
homogeneous Lorentz transformation
or
equivalently
acts on
, it yields a four-momentum eigenvector with
eigenvalue
:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
Therefore,
must be expressible as a linear
combination of state-vectors
:
![]() |
(1.5.3) |
Generally, one might be able to construct suitable linear combinations
of
such that the matrix
becomes block-diagonal. In other words,
with
values within a single block could constitute a
representation of the inhomogeneous Lorentz group on their own. It makes
sense to associate the states of a particular particle type with
components of an irreducible representation of the inhomogeneous Lorentz
group, meaning it can't be further broken down in this manner.
It should be noted that different types of particles may be related to
isomorphic representations, which means their matrices
could be identical or transformed into one another by a similarity
transformation. In certain scenarios, particle types might be defined as
irreducible representations of larger groups, which include the
inhomogeneous proper orthochronous Lorentz group as a subgroup. For
example, for massless particles whose interactions are invariant under
space inversion, it's common to treat all components of an irreducible
representation of the inhomogeneous Lorentz group as a single particle
type.
The next step in our investigation is to elucidate the structure of the
coefficients
in irreducible representations of
the inhomogeneous Lorentz group.
For our objectives, it's crucial to recognize that the only functions of
left invariant by all proper orthochronous
Lorentz transformations
are the invariant square
, and for
, also the sign of
. Therefore, for each specific value of
, and when
,
each sign of
, we can select
a 'standard' four-momentum denoted as
.
Any
within this category can then be represented
as
![]() |
(1.5.4) |
where
is a particular standard Lorentz
transformation depending on
and, implicitly, on
our chosen standard
.
Consequently, the states
having momentum
can be defined as
![]() |
(1.5.5) |
where
is a numerical normalization factor, the
specifics of which will be determined later. Up to this juncture, no
details have been provided about how the
labels
are connected across varying momenta; Equation (1.5.5) now
addresses this absence.
When applying an arbitrary homogeneous Lorentz transformation
to equation (1.5.5), we obtain:
The purpose of this last step is to show that the Lorentz transformation
first maps
to
, then to
, and finally back to
.
This transformation belongs to a subgroup within the homogeneous Lorentz
group, characterized by Lorentz transformations
that keep
invariant:
![]() |
(1.5.7) |
This subgroup is termed the little group. For any
that satisfies Equation (1.5.7), we find that:
![]() |
(1.5.8) |
The coefficients
serve as a representation of
the little group. Specifically, for any elements
,
the relationship
is satisfied, and hence
![]() |
(1.5.9) |
Particularly, we can apply Equation (1.5.8) to the little-group transformation
![]() |
(1.5.10) |
resulting in:
or, recalling definition (1.5.5):
![]() |
(1.5.11) |
Aside from normalization issues, the task of identifying the
coefficients
in transformation rule (1.5.3)
has now been distilled down to finding the representations of the little
group. This technique, which involves deriving representations of a
larger group like the inhomogeneous Lorentz group from the
representations of its little group, is known as the method of induced
representations.
Table 1.5.1 provides a suitable selection for the standard
four-momentum
along with the associated little
group for different categories of four-momenta.
Out of the six categories of four-momenta, only types (a), (c), and (f)
have any recognized implications for physical states. For class (f)
— where
—it
pertains to the vacuum state, which is essentially unchanged by
. Our subsequent discussion will be
confined to cases (a) and (c), which correspond to particles with mass
and massless particles, respectively.
Now is an appropriate time to discuss the normalization of these states.
Employing the standard orthonormalization procedure from quantum
mechanics, we can select states with standard momentum
to be orthonormal as denoted by the equation:
![]() |
(1.5.12) |
(Let me remark that
is the standard momentum and
runs over all possibilities such that
, so, for example, we can not use
(1.5.12) to calculate
.
Also
and
are normalized
such that (1.5.12) holds) The presence of the delta
function arises because
and
are eigenstates of a Hermitian operator with eigenvalues
and
,
respectively. As a direct outcome, the representation of the little
group in Eqs. (1.5.8) and (1.5.11) must be
unitary.
![]() |
(1.5.13) |
For
and
,
the little groups
and
do
not possess any non-trivial finite-dimensional unitary representations.
Hence, if there were states with a specific momentum
having
or
that
non-trivially transform under the little group, an infinite number of
such states would be required.
Regarding the scalar products for generic momenta, the unitarity of the
operator
as expressed in Eqs. (1.5.5)
and (1.5.11) provides the following formula for the scalar
product:
Here,
(Hence,
).
(Let me remark that
here is just the one in
although here we use
This is correct as
which gives
thereby getting
Since
as well, the delta function
is proportional to
.
The presence of
implies that only the
coefficient when
matters, as otherwise the inner
product vanishes. Hence, with
,
we have
![]() |
(1.5.14) |
The next step involves determining the proportionality factor that links
to
.
When integrating an arbitrary scalar function
over four-momenta subject to
and
(corresponding to cases (a) or (c)), the Lorentz-invariant
integral takes the form:
Here,
is the step function:
for
and
for
.
When integrating over the 'mass shell'
,
the invariant volume element becomes:
![]() |
(1.5.15) |
By the definition of the delta function,
we find that the invariant delta function is
![]() |
(1.5.16) |
Given that
and
are
connected to
and
through
a Lorentz transformation
, we
arrive at the following equation:
Consequently, the scalar product becomes:
![]() |
(1.5.17) |
The normalization constant
is occasionally set
to
. However, in doing so,
one would need to account for the
term in scalar
products. In this context, we will use the more common convention where:
![]() |
(1.5.18) |
With this choice, the scalar product simplifies to:
![]() |
(1.5.19) |
Next, we turn our attention to the two physically relevant cases:
particles with mass
and particles with zero
mass.
In this context, the little group is represented by the
three-dimensional rotation group. Its unitary representations can be
decomposed into a direct sum of irreducible unitary representations,
denoted by
, having
dimensions of
, where
takes values 0,
,
1, etc. These representations can be constructed from the standard
matrices for infinitesimal rotations
,
where
is infinitesimal. The representation is
given by:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
where
varies over the set
.
gives the component of
angular momentum in the three-axis. For a particle having mass
and spin
,
Equation (1.5.11) is transformed to:
![]() |
(1.5.23) |
Here, the little-group element
— often
referred to as the Wigner rotation — is given by Equation (1.5.10) as:
Let
be the Lorentz factor (w.r.t the particle with 4-momentum
). Note that the relativistic mass with
4-momentum
(w.r.t the particle with 4-momentum
) is
Hence, together with
we can rewrite the Lorentz factor to be
which gives
Let
Then a choice of
that take
to
could be
Then from this we can determine the Wigner rotation and hence the
representation with spin
,
.
Note that an infinitesimal rotation around the two-axis
followed by an infinitesimal boost along the one-axis
leaves
unchange as
Also an infinitesimal rotation around the two-axis
followed by an infinitesimal boost along the one-axis
leaves
unchange. And clearly, an infinitesimal
rotation around the three axis
leaves
. Hence, an infinitesimal small
group transformation can be rewritten as
where
We see that the commutators for these generators are
Hence, we simultaneously diagonalized
and
by their eigenstates
such
that
However, if one of
and
is not zero, then we can find a continuum of spectrum of
and
, i.e.
where
which contradicts to our assumption that
is of
discrete (experiment does not find a continuum of
for one-particle states). Hence, for physical states, we must have
(For the case when
or
, see arXiv:1302.1198.) Hence, for a physical state
, we must have
Here
is assumed to be the eigenvalue of
(now that
,
is a common eigenstate for both
,
,
and
, although neither
and
commute nor
and
), such
that
Note that
is in the three-axis,
gives the component of angular momentum in the direction of motion.
is called the helicity.
We are now ready to find the representation of the little group.
Hence,
Therefore,
where
is determined by
Instead of unitary operator acting on the Hilbert space, we prefer using the following Lorentz transformation identity.
where
where
On the other hand,
where
.
Therefore,
Hence,
where we choose
to take
to
,
where
is a pure boost along the three-direction and
with
is a pure rotation that takes the three axis
into the direction of
.
In constructing covariant free fields for massless particles with
helicity
(such as photons), one encounters a
significant complication; see Section ?. A field like the
four-potential
, as given by
Eq. (?), while commonly used, does not transform as a true
four-vector under Lorentz transformations. This presents a problem when
attempting to write a Lorentz-covariant quantum field theory. But before
diving into this issue, let's recall that we can define an antisymmetric
tensor field
for massless spin-1 particles
without difficulty. This tensor field is related to the four-potential
aμ(x) via the well-known
expression (just as in classical electromagnetism):
![]() |
(2.1.1) |
However, as shown in Eq. (?), the four-potential
does not transform purely as a four-vector under Lorentz
transformations; rather, it transforms as a four-vector only up to a
gauge transformation. That is, under a Lorentz transformation
, the field transforms according to
where
is a function that depends on the
coordinates and the Lorentz transformation, and represents the gauge
freedom inherent in the theory. This additional gradient term reflects
the non-covariant behavior of
under Lorentz
transformations, a key feature of massless vector fields like the
photon. The implication here is profound: even though the field strength
itself does transform covariantly (since it is
gauge-invariant), the potential
does not. This
is a manifestation of the gauge redundancy present in theories of
massless spin-1 fields, such as quantum electrodynamics (QED).
In the case of massless spin-1 particles, such as photons, a significant
structural limitation arises when attempting to construct covariant
quantum fields. Specifically, it is impossible to build a true Lorentz
four-vector field as a linear combination of creation and annihilation
operators associated only with helicity
states.
This stands in sharp contrast to the situation for massive spin-1
particles, where the field operator — such as the Proca field
— can be constructed from the full set of polarization states
, and transforms properly as
a four-vector under Lorentz transformations.
The key issue is that, for massless particles, only the transverse
polarizations with helicities
correspond to
physical states. The longitudinal polarization vector, which is
essential in the massive case for forming a complete Lorentz vector,
becomes unphysical as the mass goes to zero. Although the longitudinal
component contributes to the field operator of the massive theory, it
ultimately decouples from physical matrix elements due to current
conservation. However, this decoupling does not remove its mathematical
role in ensuring the Lorentz covariance of the field operator.
Therefore, when taking the massless limit
one
cannot simply discard the longitudinal mode without losing the ability
to maintain manifest Lorentz covariance.
This fact manifests clearly in the propagator of a massive vector field. The propagator for the Proca field takes the form:
and one immediately sees that the second term in the numerator of the
integrand becomes singular as
.
This divergence is not merely a technical problem; it reflects a deeper
physical truth: the longitudinal component required to complete the
four-vector structure becomes ill-defined in the massless limit. In
other words, the theory does not admit a smooth transition from the
massive to the massless case at the level of the covariant field
operator.
The underlying reason for this difficulty lies in the representation
theory of the Poincaré group. For massive particles, the little
group is
, and one can build
covariant fields corresponding to finite-dimensional irreducible
representations. In contrast, for massless particles, the little group
is
, which includes not only
helicity (rotations around the direction of motion) but also
“translations” in the plane transverse to the momentum.
These translation-like generators do not act trivially on the
polarization vectors and correspond to gauge transformations in
field-theoretic language. As a result, any attempt to construct a
covariant field from only helicity eigenstates necessarily introduces
gauge redundancy: the field can at best transform covariantly up to a
gauge transformation.
This explains why the four-potential
,
though commonly used, does not transform as a true four-vector. Instead,
under Lorentz transformations, it picks up an additional gradient term
— a manifestation of gauge freedom. This is a direct reflection of
the impossibility of representing helicity
states within a true vector representation of the Lorentz group. The
singularity in the propagator at
is thus not an
artifact of poor regularization or bad limits, but a genuine structural
signal: it tells us that the massless theory must be formulated
differently — not through a Proca-like field, but via gauge fields
with constrained degrees of freedom, such as in quantum electrodynamics.
We could avoid the complications arising from the non-covariant
transformation properties of the gauge potential
by imposing a strong constraint on the form of the theory: namely, that
all interactions should involve only the field strength tensor
(We use
and
, instead of
and
, for the eletromagnetic potential
vector and the field strength tensor because these are interacting
fields.) and its derivatives, and not
itself.
Since
is manifestly gauge invariant under the
transformation
![]() |
(2.1.2) |
a theory built entirely from
and its derivatives
would automatically be invariant under gauge transformations. It would
also avoid the problem that
,
as discussed earlier (see Eq. (?)), transforms only up to a
gauge term under Lorentz transformations.
However, such a restriction would be overly rigid — it does not
describe the most general class of interactions, and crucially, it is
not the structure realized in nature. Physical theories such as quantum
electrodynamics (QED) include interaction terms where
appears explicitly, as in the minimal coupling term
which cannot be written purely in terms of
.
For this reason, we do not banish
from the
theory. Instead, we retain
as a dynamical
variable, and impose a compensating symmetry requirement: that the
matter action, which includes the matter fields and their interaction
with the gauge field, must be invariant under general gauge
transformations of the form (2.1.2) at least when the
matter fields obey their equations of motion.
This approach ensures that the unphysical degrees of freedom associated
with the gauge redundancy in
do not affect
physical observables, even though
itself is not
gauge invariant. If we allow
to shift by
, then the variation of
is given formally by:
![]() |
(2.1.3) |
This expression arises from a general principle in field theory: when a
functional depends on a field
,
its variation under a change in that field is obtained by integrating
the functional derivative times the variation of the field.
To proceed, we apply integration by parts to this expression, under the
assumption that
vanishes sufficiently rapidly at
infinity so that boundary terms can be neglected. This gives:
For the action to be gauge invariant, i.e., for
, we need
![]() |
(2.1.4) |
This is a condition imposed not on
,
which is arbitrary, but on the structure of the action itself. It
ensures that even though
transforms
inhomogeneously under gauge transformations, the matter action remains
invariant. The significance of this condition will become clearer
shortly, once we interpret
as the source current
for the gauge field.
In special cases, this condition is trivially satisfied. For example, if
the matter action
depends only on the
gauge-invariant tensor
, and
not on
itself, then the functional derivative
can be computed explicitly using the chain rule:
Using
, we vary each term
with respect to
. By
definition of functional differentiation,
.
When a derivative acts on the field, the corresponding functional
derivative produces a derivative of the delta function:
and similarly
. Subtracting
these gives
Substituting this back into the chain rule and collecting indices, we obtain
We now integrate by parts in
,
moving derivatives off the delta functions and onto the functional
derivatives; surface terms vanish under standard boundary conditions.
Using
, the two terms become
Because
is antisymmetric, the functional
derivative
is also antisymmetric:
. Using this antisymmetry to relabel indices,
the two terms add to the same structure and we arrive at
![]() |
(2.1.5) |
Thus, the functional derivative of the action with respect to
is given by the divergence of a quantity. Taking another
divergence yields:
Therefore, if
depends only on
, the condition in Eq. (2.1.4) is
satisfied automatically – gauge invariance is guaranteed by
construction. Moreover, Eq. (2.1.5) is also true when
depends only on
and its
derivatives. But we omit the calculations.
However, if
involves
itself, the expression
will generally not be a
total derivative, and hence the vanishing of its divergence becomes a
non-trivial constraint. In such theories, gauge invariance imposes a
dynamical condition on the form of the interaction between matter and
gauge fields — one that is often interpreted (in later steps) as
the conservation of a physical current.
The question is what sort of matter theory provides conserved currents
suitable for coupling to a vector field
.
As established earlier, infinitesimal internal symmetries of the matter
action
yield conserved currents by Noether's
theorem.
Let
be the matter fields carrying a real-valued
charge
under a global
internal symmetry. An infinitesimal symmetry transformation is written:
![]() |
(2.1.6) |
This transformation corresponds to a local phase rotation of the field
, weighted by its charge
.
However, suppose we consider only the case where
is constant. In that case, we say the symmetry is global, and we assume
that this constant transformation leaves the matter action
invariant. This invariance under global transformations is
sufficient to imply, via Noether's theorem, the existence of a conserved
current.
To extract the form of that current, we allow
to
vary in spacetime — i.e., we “localize” the symmetry
— and examine how the matter action responds. Since the
transformation (2.1.6) depends on
, its spacetime variation introduces derivatives of
into the variation of the action. Specifically,
the change in the matter action takes the general form
![]() |
(2.1.7) |
for some vector function
constructed from the
matter fields and their derivatives. This is the Noether current
associated with the internal
symmetry. The minus
sign and the derivative of
follow from
integrating by parts during the variation process.
Now, if we evaluate this variation on shell — meaning we assume
the matter fields satisfy their classical equations of motion —
then the action must be stationary under such variations. Since
is arbitrary and smooth, the only way the integral
can vanish for all such
is if the integrand
vanishes:
![]() |
(2.1.8) |
This is the local conservation law for the current
, valid when the matter field equations are
satisfied.
In Section ?, we see how this conserved current arises in
practice: if the matter Lagrangian is a function of the fields and their
first derivatives — that is, if
—
then the current
can be explicitly constructed
by Noether's procedure. The result is a local function of the fields and
their derivatives:
and this generates the transformations (2.1.6) in the sense that
![]() |
(2.1.9) |
where
is the time-independent charge operator
![]() |
(2.1.10) |
This setup now provides a natural and physically meaningful way to
introduce interactions with a vector field
.
Since we have a conserved current
,
the simplest Lorentz-invariant and gauge-compatible interaction is to
couple the field
directly to the current. This
could be expressed this by identifying the functional derivative of the
matter-plus-interaction action with respect to
as the current:
![]() |
(2.1.11) |
The conservation of electric charge only allows us to determine the
relative values of the charges assigned to different fields—it
tells us that the ratios between these charges must remain fixed.
However, it does not by itself determine the absolute normalization of
the charges. For example, if we know that one field has twice the charge
of another, we still need to specify what that “unit” charge
is. Conventionally, we define the elementary unit of electric charge,
denoted
, by assigning the
value
to the electron. All other electric
charges in the theory are then measured relative to this fundamental
scale. Importantly, it is equation (2.1.11), that gives a
precise operational meaning to the value of
: it tells us how the current couples to the gauge
field and therefore determines the physical strength of the
electromagnetic interaction. Without such a coupling term, the notion of
charge would remain abstract—purely a symmetry label without
physical consequences.
This requirement—that the matter action couples to the vector potential via the current—is more than just a convenient prescription; it can be restated as a fundamental invariance principle. Specifically, the action is required to be invariant under simultaneous local transformations of both the gauge field and the matter fields. The transformation of the gauge field is given by
![]() |
(2.1.12) |
and the matter fields transform as
![]() |
(2.1.13) |
where
is an arbitrary real function on
spacetime, and
is the charge associated with the
field
. A symmetry of this
kind—where the transformation parameter is a spacetime function
rather than a constant—is referred to as a local symmetry, or more
precisely, a gauge invariance of the second kind. In contrast, the more
familiar case of a transformation with constant
ϵ is called a global symmetry, or a gauge
invariance of the first kind.
Exact local symmetries—like the
symmetry
of electromagnetism—are now known to be fundamental to our best
theories of nature. They play an essential role in formulating
interactions between fields and are preserved at both the classical and
quantum levels. On the other hand, purely global symmetries seem not to
be fundamental. They are now often interpreted as accidental
consequences of local symmetries and other structural constraints in the
theory. Further discussion of this perspective is in Section ?.
Up to this point, the discussion has focused on the interaction of matter fields with the gauge field. But to fully specify the dynamics of the system, we must also provide an action for the gauge field itself—that is, for the photon. A natural guess is to use the same form as the action for a massive vector field, but now with the mass set to zero. This leads to the classical electromagnetic action:
![]() |
(2.1.14) |
where
is the antisymmetric field strength
tensor. Although this is the same as the Lagrangian used in classical
electrodynamics, its real justification comes from quantum field theory:
it is, up to a multiplicative constant, the unique Lorentz-invariant and
gauge-invariant action that is quadratic in
and
involves no higher derivatives. This minimality ensures both the correct
degrees of freedom for a massless spin-1 field and a well-behaved
quantum theory. Any other gauge-invariant contributions involving
higher-order terms in
, or
additional derivatives, are typically suppressed and treated as
effective corrections that can be absorbed into the matter sector of the
theory.
Given the interaction rule (8.1.11) and the free action (8.1.14), we can
now derive the field equations for the electromagnetic field by varying
the total action with respect to
.
This yields:
![]() |
(2.1.15) |
which are immediately recognized as the inhomogeneous Maxwell equations,
where the current
acts as the source of the
electromagnetic field. These equations encapsulate Gauss's law and
Ampère's law (with Maxwell's correction) in covariant form.
In addition to the inhomogeneous Maxwell equations, there are also
homogeneous equations that follow directly from the definition of the
field strength tensor
.
Specifically, the identity
![]() |
(2.1.16) |
not dynamical equations but rather geometric constraints, often referred to as the Bianchi identities. In terms of physical content, they express the absence of magnetic monopoles and encode Faraday's law of induction.
This framework, built on local gauge invariance, not only reproduces Maxwell's equations as field equations derived from a variational principle but also underpins the full quantum theory of electrodynamics. It makes precise the meaning of electric charge, determines the allowed forms of interactions, and provides a consistent and predictive theory of massless spin-1 fields.
In the above discussion, we began by postulating the existence of massless spin-one particles, and from this starting point, we were led to infer that the matter action must be invariant under local gauge transformations — specifically those given by equations (2.1.12) and (2.1.13):
In this formulation, gauge invariance is a derived consequence — we infer the structure of local symmetry transformations from the requirement that a massless spin-one field be consistently coupled to a conserved current, while preserving Lorentz invariance and locality. But this logical flow is in fact the reverse of how gauge theory is traditionally presented.
Ordinarily, one begins not with a massless vector field but with a global internal symmetry of the matter fields. For instance, we may suppose the action is invariant under the transformation:
![]() |
(2.1.17) |
where
is a real constant. The key question then
becomes: how can this symmetry be generalized to allow
to depend on spacetime position? That is, what must be done to promote
this to a local symmetry:
![]() |
(2.1.18) |
with
now an arbitrary smooth function?
If the matter Lagrangian depended only on the fields
, and not on their derivatives, then it would
make no difference whether
is constant or a
function of
. In such a
purely algebraic theory, invariance under global symmetry would
automatically imply invariance under local symmetry. However, realistic
field theories always involve derivatives of the fields — kinetic
terms, in particular, are essential both classically and quantum
mechanically. And it is in the presence of these derivatives that a
problem arises.
The issue is that derivatives of the fields do not transform in the same way as the fields themselves under a local transformation. Explicitly, we compute:
![]() |
(2.1.19) |
So that
picks up an extra term proportional to
. This spoils the
transformation property needed for local invariance. In a Lagrangian
that includes
, this
additional term ensures that the action is no longer invariant under
local phase rotations, even though it may be invariant under global
ones.
To restore invariance, we introduce a new field
, which transforms in a way that cancels out the
unwanted inhomogeneous term. Specifically, we define:
![]() |
(2.1.20) |
and construct the covariant derivative:
![]() |
(2.1.21) |
which is designed so that under the local transformation (2.1.18) and (2.1.20), we have:
![]() |
(2.1.22) |
exactly matching the transformation law of
itself. The significance of this construction is that any Lagrangian
formed solely out of
, its
covariant derivatives
, and
their conjugates will now be invariant under the local symmetry,
provided that it was invariant under the global version.
A matter Lagrangian density
will be invariant
under local transformations (2.1.18) and (2.1.20),
with
an arbitrary function, if it is invariant
when
is a constant function on the spacetime. We
Lagrangian of this form, we have
![]() |
(2.1.23) |
which is the same as Eq. (2.1.11). (More generally, one
could also include terms involving the field strength
and its derivatives in the matter Lagrangian, in which case such terms
would contribute to
as well.) Viewed from this
perspective, the masslessness of the field
is
not an independent postulate, but rather a direct consequence of gauge
invariance. Any attempt to add a mass term of the form
to the Lagrangian would explicitly break the gauge symmetry, and is
therefore forbidden if local gauge invariance is to be maintained.
There are aspects of electrodynamics that obstruct a straightforward
canonical quantization of the electromagnetic field in the same way we
quantized theories of massive particles in the previous chapter. As
usual, we begin by defining the canonical momenta conjugate to the
vector potential
. For the
Maxwell–matter Lagrangian
the canonical momentum conjugate to
is
![]() |
(2.2.1) |
With no constraints, the “usual rules” would then impose the equal-time canonical commutators
However this is not possible here, because
and
are subject to several constraints. The first
(primary) constraint is immediate: the Lagrangian density contains no
time derivative of
, so
![]() |
(2.2.2) |
This is “primary constraint” in Dirac's sense: it follows
directly from the definition of canonical momenta. Demanding that a
primary constraint be preserved in time produces an additional
(secondary) constraint. The Euler–Lagrange equation obtained by
varying the action with respect to
reads
![]() |
(2.2.3) |
Even though the matter Lagrangian may contain
explicitly, the charge density
is a functional
only of the canonical matter variables
and their
conjugate momenta
;
schematically,
![]() |
(2.2.4) |
Hence (2.2.3) is a genuine functional relation among
canonical variables, not an equation involving time derivatives that
could be used to evolve data. The pair of constraints (2.2.2)
and (2.2.3) is incompatible with the naive equal-time
structure
and
.
In particular, if
as an operator constraint, we
cannot at the same time have
;
and if Gauss's law
holds as an operator
identity, the longitudinal part of
is not an
independent degree of freedom and cannot satisfy free canonical
commutators with the longitudinal part of
.
We saw a very similar obstruction already in the theory of a massive
vector field (the Proca field). There, too, the naive canonical
quantization rules were inconsistent because not all components of the
vector potential
were independent dynamical
variables. In the massive case, we found two equivalent strategies to
resolve this: either to use Dirac brackets, or more directly, to treat
only the spatial components
and their conjugate
momenta
πi as independent
canonical variables, and then use the analog of Gauss's law to solve for
in terms of these. In that case, the constraint
was