A probability distribution describes the likelihood of each possible outcome for a random variable. It is a fundamental concept in statistics that tells us how probabilities are distributed over the possible values the variable can take. Understanding distributions allows us to move from simply describing data to quantifying uncertainty and making probabilistic forecasts, the very core of a quantitative trading endeavour.
In our previous lesson on random variables, we distinguished between discrete and continuous variables. This distinction gives rise to two families of probability distributions, each with its own mathematical language.
A discrete random variable has a finite or countably infinite number of possible values, like the number of ticks a stock moves in a minute. Its distribution is described by a Probability Mass Function (PMF), which assigns a specific probability to each exact outcome.
For a valid PMF, two conditions must hold:
This is a valid PMF as the probabilities are non-negative and sum to 1. The expected outcome (mean) is . Our expectation is a slight positive drift post-announcement.
A continuous random variable can take any value within a given range, such as the daily percentage return of NIFTY. We cannot assign a non-zero probability to a single point, as there are infinitely many points. Instead, we describe its distribution using a Probability Density Function (PDF), denoted . The PDF’s value is not a probability; it represents probability *density*. To get a probability, we must integrate the PDF over an interval.
.
For both discrete and continuous variables, the Cumulative Distribution Function (CDF), denoted , provides a unified way to describe the distribution. It gives the total probability that the variable takes on a value less than or equal to .
For a continuous variable, , and consequently, the PDF is the derivative of the CDF: . The CDF is non-decreasing and always ranges from 0 to 1, making it one of the most fundamental objects in probability theory.

What this means for a trader: The CDF is directly applicable to risk management. The probability of a loss exceeding some value is , is defined as the expected value of :
The magic of the MGF lies in its Taylor series expansion:
By differentiating the MGF with respect to and evaluating at , we can extract the raw moments ().
For example, the first moment (mean) is , and the second raw moment is , from which we get the variance: .
Not all distributions have a well-defined MGF (it might not converge). A more universally applicable tool is the Characteristic Function (CF), , which always exists. It’s defined using a complex exponential:
The CF has similar properties to the MGF and is central to advanced probability theory, particularly for working with sums of random variables and distributions like the Cauchy that lack finite moments.

Distributions are often summarised by a set of parameters known as moments. These describe the shape and location of the distribution. For traders, the first four are the most critical for understanding risk and return.
In practice, we never know the true underlying probability distribution of a financial random variable like the daily returns of ICICI Bank. All we have is a finite sample of historical data. From this sample, we can construct an empirical distribution. A histogram is the most common way to visualise the empirical PDF.
The Law of Large Numbers gives us theoretical comfort: as we collect more data (i.e., as the sample size ), the empirical distribution will converge to the true, unknown underlying distribution.

What this means for a trader: All our backtesting and risk modelling is done on the empirical distribution. The core assumption is that this historical distribution is a good proxy for the future distribution. This assumption (stationarity) often breaks down, especially during market regime changes, which is a primary source of model failure.
The normal (or Gaussian) distribution is the cornerstone of modern finance, underpinning models like Black-Scholes. It is mathematically elegant and provides a decent first approximation for financial returns. However, the strict assumption of normality is demonstrably false and dangerously misleading.
When we plot a histogram of NIFTY daily returns against a fitted normal distribution, we see a close match near the center. But the empirical distribution shows two critical deviations:

Why do returns have fat tails? One powerful explanation is the concept of a mixture distribution. The market doesn’t operate in a single, static mode. Instead, it switches between different “regimes”—typically a low-volatility “calm” regime and a high-volatility “panic” regime.
The overall distribution of returns is a probability-weighted average (a mixture) of the distributions from these two regimes. The panic regime, even if it occurs only 5-10% of the time, contributes its high variance to the mixture, fattening the tails of the combined distribution.
This model intuitively captures market reality: long periods of quiet punctuated by short, violent bursts of activity. This is the mathematical source of fat tails.

How can we visually check if our data fits a certain theoretical distribution (e.g., normal)? The histogram is a start, but a more rigorous tool is the Quantile-Quantile (Q-Q) plot.
This plot compares the quantiles of our empirical data against the theoretical quantiles of the distribution we are testing.
Traders and quants use a menagerie of distributions to model different market phenomena. Choosing the right one is key to building robust models.
| Distribution | Use Case in Trading | Key Parameters |
|---|---|---|
| Normal | A flawed but foundational model for daily/weekly returns. Central to Black-Scholes. | Mean (), Std. Dev. () |
| Log-normal | Modelling asset prices (), which cannot be negative. If log-returns are normal, prices are log-normal. | of the log-returns. |
| Student’s t | A superior model for returns that explicitly incorporates fat tails. A common replacement for the normal. | Degrees of Freedom (). As , it becomes normal. between 3 and 5 is common for returns. |
| Binomial | Modelling a number of “successes” in a fixed number of independent trials (e.g., number of up-days in a week). | Number of trials (), success probability (). |
| Poisson | Modelling the number of events occurring in a fixed interval of time/space (e.g., number of trades in a 1-minute bar). | Rate parameter (). |
| Exponential | Modelling the time *between* events in a Poisson process (e.g., time between two consecutive large trades). | Rate parameter (). |
| Pareto / Power-law | Modelling “black swan” events, the size of market crashes, wealth distribution. Describes “80/20” phenomena. | Shape parameter / exponent (). |
| Weibull / Gamma | Flexible distributions for modelling durations, such as the time until a stop-loss is hit or a trade is closed. | Shape (), Scale (). |

In information theory, the entropy of a distribution measures its “surprise” or uncertainty. A uniform distribution, where all outcomes are equally likely, has maximum entropy. A distribution concentrated on a single value has zero entropy. For a trader, high entropy implies low predictability.
Correlation only measures linear relationships. Markets, however, exhibit complex, non-linear dependencies, especially during crises (e.g., all asset classes fall together). Copulas are functions that separate a multivariate distribution into its marginal distributions (for each variable) and a structure that describes their dependence. This allows for far more sophisticated modelling of portfolio risk than simple correlation matrices.