The laws of physics are not to be democratic. They are not to satisfy opinions of majorities, but to rule imperially, regardless of our ignorance about their true forms. Intuitively enough, the laws of physics should be the same for everybody, regardless of where we are, when we are, or how we are. This was stated by Einstein in 1905 in his paper “On the Electrodynamics of Moving Bodies“, together with the claim that there is a maximum speed allowed in the universe, that of light.

About the first statement, we could argue for some hours based on ontological arguments, experiments, observation, etc. and about the second, Einstein himself gave some interesting thought-experiments that you could find recreated in many different version, dispersed in all the modern physics books that have been written ever since. In this post we just going to accept the and work our way from them. For convenience we are going to set here the speed of light equal to the unit ()

This is the foremost example of symmetries in physics: We change something in the system (observer) and we keep something invariant (the laws describing the system).

First we need agree on what to we mean when we say “changing the observer” and then we can agree on what are those laws all about. Usually on physics, two observers are simply two different coordinates system used to describe the studied events. For example, an observer will be a guy (you can call him Bob if you like, he is from Minnesota) having a coordinate system , while another observer (you can call her Alice, like in the book) has a coordinate system . Notice here that I’m including time on the coordinates, this is because coordinate systems can be used to label events occurring in physical reality and for that we not only need a *where*, but also a *when*.

Some formalities to simplify the analysis:

- Both Alice and Bob will agree on the origin of coordinates: This means that they would have had synchronized their clocks at an instant where the spatial coordinate system agree on the origin. This may imply some complications on practicality (how are they suppose to actually do this), but I assure you it makes at least mathematical sense; both of them agree that describes the same event.
- Since we are use to coordinates describing space, it may be weird to have time between them, since they don’t have even the same units. There are different way to deal with it, but usually instead of time, our first coordinate would be , meaning the distance travel by light in that given time. However, I find annoying to be carrying around an extra letter everywhere we need time, that’s why we stick with the convention I mentioned before of setting the speed of light equal to the unit. Under this convention we could say that time and distances are measure in the same units.

With these things settled we can focus on the second part of our claims: The universal speed of light. Let’s imagine that somehow we can create a “sphere of light” at one point, say, the origin. This sphere will expand as time passes by at, of course, the speed of light. At some time , Bob will observe an sphere of radius ; any point on that sphere of light will obey the following equation:

This sphere makes a clear division of the space: We have the inner part of the sphere, described by and the outer part of it . These are not only math-defined regions, but they have a real physical meaning. Imagine that a particle is released at the origin at the same time as the sphere of light and it moves under some physical law. Since the speed of light is the fastest this thing can move, it cannot excess the limit define by the sphere of light. Thus, the region given by the inner sphere will be the only part of space in which this particle can be found, while the outer part will be the region of all of those points impossible to reach at a certain time. Lets rephrase this in a bit of a fancy way:

Consider the function of space and time . From the previous discussion, the positive level curves are those points in which the particle could be found, the negative ones are those in which the particle cannot be, and the zeroth level curve is corresponding to the points in the light sphere. Now with this prepared we can ask ourselves: What about Alice ?

Well, Alice won’t necessarily have the same coordinates as seen by Bob. Actually, if Alice coordinate system (also known as reference frame) is moving, it’s perception of the space coordinates may be different, but since there is one fixed velocity of light, the time component will also be needed to be modified accordingly; our work here is to know how those changes of coordinates are to be.

Regardless on how twisted the relation between Alice’s coordinates and the ones from Bob, they both should agree on at least the sphere of light, because both of them see the same speed of light. Then, in Alice coordinates, the sphere is also . Furthermore, both Alice and Bob points of view will follow same physical rules, so, if something is not physically possible on the eyes of Bob, it should not be possible for Alice neither. In terms of our function. If Bob sees a point such that , it should also be the case for Alice that . This can now be stated as a formal math problem:

We are looking for the coordinate changes such that the function maintains its signature fixed and leaves its kernel fixed. However difficult this problem maybe, we can take a more stringent condition, looking for those coordinate changes that leave the level curves of this function completely invariant. This poses no further complication since it is enough to satisfied the above discussion.

Now, what changes could we do to leave this function invariant? Well, any change performed in the vector leaving its magnitude invariant will obviously leave this function fixed. For those familiar with linear algebra or group theory, you will recognize that this set of symmetries corresponds to the group . This group corresponds to all rotations in 3D space, as well as the possible space reflections. This makes sense: no matter in which direction you fixed your axis, your physics laws should be consistent.

But we said that interesting stuff can happen with time. To see this, consider a vector as the following:

The coordinate changes that interest to us can be represented with a matrix such that:

Those transformations belonging to will only affect the spatial components of such vector, so we could write them as:

Where is a 3-by-3 matrix such that this is the defining condition of the usual representation of for real matrices; if you are not very familiar with this part, don’t worry, it is not that important for the rest of the discussion.

For those transformations involving both time and space, we can fix an associated direction in space, which, without loss of generality, will be taken to be the axis. This reduces our coordinate change problem to finding the 2-by-2 matrices such that:

where we ignore the other two coordinates, assuming them fixed. The following may be a little annoying, but it is just algebra. It is not that hard to convince oneself that the function can be rewritten as . With this we can see that the transformations living this function invariant are such that:

Which is equivalent to the following system of equation:

(if you don’t like the matrix treatment, you could only substitute the change of variables and compare coefficients, this would yield the same system of equations). To solve this equations we could sum together the first one and the third one, plus two times the second one, this gets us to perfect squares as follows:

This implies that one of the following cases is to be satisfied:

- Proper case: and
- Improper case: and

Why proper and improper? I’ll tell you later. For the proper the thing is quite simple: applying the Vieta’s formulas we can see that the pair , and the pair correspond, up to ordering with the roots of the polynomial . This together with the pact that (see the first equation of the system), tells us directly that for the proper case and . For the improper case we can proceed similarly noting that the previous polynomial has the following parity relation: , yielding as result that for the improper case and . Putting this together we obtain two coordinate changes that involve time and the direction:

- Proper transformation: , where .
- Improper transformation: , where .

Since we haven’t impose any restriction to , in principle it can take any real value, which allows for a nice change of variables. Consider the hyperbolic sine function ; it gives a bijection of the real numbers unto themselves, which yields a nice change of variables (the minus sign is not important; you can leave out since you square it later, I just like to have it there for convention). This, combined with the restriction gives directly . Then, our changes of coordinates now looks simply as follows:

- Proper transformation: .
- Improper transformation: .

What’s the point of having both types of transformation? Not much really, as you can easily check, the main difference between both is the face that proper transformations will have 1 as determinant, while the improper ones will have -1. This can be taken into account using a reflection with respect to the plane, so there is not a point to keep carrying this around; we will stay only with the proper ones.

Where’s my physics? Well imagine that Alice and Bob are at the origin when their clock start running. Then Alice starts moving away from Bob with a velocity of . Bob, in a certain time , will see that Alice’s position is . Thus, Bob will assign to Alice a vector . However, Alice is not moving away from Alice (of course not!), so, even if here clock is registering a time , here position will still be , (for here, Bob is the one moving away from her). This means that she will assign herself a vector . Using our coordinate change matrix, we can related those two vector, in particular their second component:

With this in mind and having in mind some hyperbolic identities, we can have our coordinate change matrix in terms of the physical parameters of the situation:

.

where we introduced the Lorentz factor .

This line of thought also gives us an insight on the meaning of the function that we were so worried about keeping fixed. If an observer, as Alice, is studying it own history, its position vector won’t change, meaning that the function will only be measuring its time component . This hints to the introduction of a different labeling of the function, namely , where the the Greek letter is referred as proper time. Moreover, we know can get over the part of a fixed origin by considering instead space-time displacement (replacing by ), leading to a conserved quantity . This means that the proper time between to points in space-time (referred as events) tells us the time interval as seen by an observer that is following such interval. This now sounds more reasonable: if everybody agrees on the laws of physics and the appropriate change of coordinates, then everybody should agree on what a guy following a given trajectory will measure (namely, what his watch should tell).

The concept of proper time and intervals opens a door to certain considerations. For example we could ask ourselves how the notion of time of proper time for an observed interval varies from our own notion of time. This, for a fixed initial point, how the proper time interval varieties:

.

For a small enough interval the quotient tends itself to the speed of a particle going from one event to the other one. Inverting this expression we obtain an interesting identity:

.

The interesting part of this expression that it helps us relate the Lorentz factor with a physical interpretation: It “measures” the discrepancy between our clock ant that of an observer following a given trajectory we are interested in; if the observer is not moving with respect to us, the Lorentz factor will be equal to one and no discrepancy will be found, but as soon it starts moving, the Lorentz factor will increase, measuring time longer than the ones lived in the trajectory.

Since we are having discrepancies on time, we will also may have discrepancies in other quantities derived from it, like the measurement of velocity itself. How would a velocity vector would look if instead of derivating position with respect to time, we had done it with respect to proper time? I’ll use another letter to avoid confusion:

.

Moreover, if we consider the vector , the coordinate change will leave the proper time invariant while changing the position according with the above discussion. From this it follows directly that, for a coordinate change matrix , this new vector transform as the position vector:

The objects that transform as such are called 4-vectors (yes, super creative), this one in particular is called 4-velocity. The coordinate change matrix that involves both space and time as the ones studied are called Lorentz boost. With kinematics more or less in place, now we can think about dynamics.

In the previous post I discussed the Lagrangian formalism. In this approach, the dynamics of the system are directly derived from the associated Action. Because of this, two systems having the same Action, will follow the same physics’ law. However, the Action is constructed from the Lagrangian; this is a function of the space and time, then, what happen if we change of coordinates? Well, a change in coordinate will yield a new Lagrangian. How two Lagrangian describing the same system are to have the same physics? Well, they could be the same Lagrangian! As long as the associated action stays invariant under coordinates changes, all observer will agree on the physics derived from it. Such functions of space and time are called scalars or “Lorentz scalars”.

We would like to construct a free-body action which behaves as a Lorentz scalar, and we already know about something that behaves as such: the proper time interval: . We can use the latter to construct an action integrating over differential interval to ensure it stays Lorentz-invariant:

Why did I included the mass? Well, even if we are ignoring the time-length units ( because ), we still need a mas factor to account for the action units. And what about the minus sign? Mostly convention. If you were to do a series expansion of the Lagrangian for the low speed limit, you would obtain a term that corresponds with the classical kinetic energy ; the minus sign is only to have an agreement with this.

Having a Lagrangian we can now get equations of motion. We can start deriving the canonical momenta:

Similarly for the other components:

where is the non-relativistic moment that we are used to. Once again the Lorentz factor gives us the correction with respect with our usual mechanics. It is easy to see that, since we have no position dependence in the Lagrangian, our equations of motion will yield a constant momentum and, because of the expression we obtain, this is describing a particle moving with constant velocity as expect from free-motion.

Is that it? No, don’t forget about the energy. Since the our Lagrangian does not depends explicitly on time, we expect the associated Hamiltonian to be a conserved quantity. Now that we know the momenta, we can proceed and calculate it.

and by using the equivalence with hyperbolic functions:

In first regard this expression shouldn’t be that impressive since it has the “expected” behavior; the Lorentz factor is a function of velocity and this equation tells us that the faster you move, the more energy is involve. Nothing wrong so far. But now consider the vector . It is not hard to convince yourself that this is exactly the same as the 4-velocity, except for the mass factor. Since the only difference is a multiplicative constant, we confirm this is also a 4-vector, called the 4-momentum (Yeah, I know, it was not hat hard to figure out those names).

Being a 4-vector, it should behave the same as the 4-displacement vector, i.e. everybody should agree on the value of squaring its first component and subtracting the magnitude squared of the rest of the vector. This quantity is called a Minkowski norm squared. What’s this value for our 4-momentum?

This is a fairly beautiful and kinda familiar equation. To see this notice that we can rewrite it as $latex E^2=m^2+|\vec{p}|^2&fg=000000$. This is like some kind of relativistic Pythagoras theorem. Moreover, if the thing we are studying is at rest with respect to us (), this expression concludes something interesting: $latex E=m&fg=000000$, or more familiarly, if we hadn’t omitted the speed of light everywhere:

.

This implies that matter has an inherent energy associate to it just because of it existence, apart from its dynamics. This equations also helps us explain why matter cannot travel faster than light (the reason behind the light sphere limit). As we approach to the speed of light $latex |\vec{\beta}|&fg=000000$ tends to 1, this is a divergence in the Lorentz factor, leading to a divergent energy, i.e. we would need an infinite amount of energy to reach such velocity.

This was an introduction to Special Theory of Relativity. I write this move by the beauty of it own development. In some courses we receive the Lorentz transforms and some 4-vectors out of definitions due to convenience or time of the exposition, however I think it is important to see how the concepts arise simply from the symmetry considerations here exposed. Nowadays we use more sophisticated methods to do this. Once we agree on the need to preserve the Minwkosky norm, we can study the algebraic structure of the symmetry group that does it. By deriving the its associated Lie algebra we can easily obtain its group representation in four-dimensional space. This yields the same boost matrices that we derived here, together with the rotations in space; this is a powerful approach since it allows us to connect with other new concepts that have appeared on physics in the last century. However, such approach is not intuitive at all and can obscure the physics, making it a bad first-encounter, because its “axiomatic” appearance.

So… What’s glitch on reality? The is not such a thing, at least not in this classical depiction (non-quantum) of special relativity. The glitch mostly comes with our traditional perceptions. Our views on physical science before the creation of SR assume certain concepts as an universal time and certain coordinate change out of what sounded reasonable. The problem is that these ideas came from our daily-life experience. Our thinking evolved in a low-speed world, so it was reasonable that our thoughts had fails when dealing with high speed matter; things like matter being energy, time measurements being stretch was plain non-sense. It was until we stressed the ideas of symmetries and universality of physical laws that we started to humbly recognized the glitch on our perception, opening the doors for a better understanding of reality.