Historical volatility (also called realized volatility) measures how far the normal standard deviation is from expected mean, over a given period of time.
OK, OK, it ain't explain shit … what is it?
Get data
We are going to use CryptoCompare's FREE API and fetch daily OHLC prices for BTCUSD.
import pandas as pd
url = "https://min-api.cryptocompare.com/data/v2/histoday?fsym=BTC&tsym=USD&limit=100"
response = pd.read_json(url, convert_dates=['time'])
df = pd.DataFrame(response["Data"]["Data"], columns=['time', 'open', 'high', 'low', 'close'])
print(df.tail())
time open high low close
96 1616112000 57643.32 59453.32 56283.37 58060.21
97 1616198400 58060.21 59911.70 57837.79 58101.34
98 1616284800 58101.34 58621.34 55635.61 57374.95
99 1616371200 57374.95 58431.44 53795.78 54095.36
100 1616457600 54095.36 55455.64 52984.36 55141.96
Transform data
Time column seems to be the returned as timestamp but we want to work with date type instead, let's cast it.
df['time'] = df['time'].astype('datetime64[s]')
print(df.tail())
time open high low close
96 2021-03-19 57643.32 59453.32 56283.37 58060.21
97 2021-03-20 58060.21 59911.70 57837.79 58101.34
98 2021-03-21 58101.34 58621.34 55635.61 57374.95
99 2021-03-22 57374.95 58431.44 53795.78 54095.36
100 2021-03-23 54095.36 55455.64 52984.36 55141.96
Return
Why we use log returns instead of simple returns is a longer story but you can see it as simple rate vs. continuously compounded rate.
Detalied info in returns, risk-adjusted ratios, metrics and interest-rate blog posts.
\[ \textcolor{blue} { r_i = \log{\frac{x_i}{x_{i-1}}} = \log({x_i}) - \log{(x_{i-1})} } \]
import numpy as np
df = df.assign(prev_close=df.close.shift(1))
df = df.assign(logr=np.log(df.close / df.prev_close))
print(df.tail())
time open high low close prev_close logr
96 2021-03-19 57643.32 59453.32 56283.37 58060.21 57643.32 0.007206
97 2021-03-20 58060.21 59911.70 57837.79 58101.34 58060.21 0.000708
98 2021-03-21 58101.34 58621.34 55635.61 57374.95 58101.34 -0.012581
99 2021-03-22 57374.95 58431.44 53795.78 54095.36 57374.95 -0.058859
100 2021-03-23 54095.36 55455.64 52984.36 55141.96 54095.36 0.019163
Variance
Variance (σ² - sigma squared) is the average of squared distances from the mean (μ - mu).
\[ \textcolor{blue} { \sigma^2 = \frac{\sum_{i=1}^{N} (r_i - \mu)^2}{N} } \]
n = df['logr'].size - 1
print("Total " + str(n) + " samples in dataset")
mu = df['logr'].mean()
print(mu)
def squared_distance(x, m):
return (x - m)**2
variance = df['logr'].apply(squared_distance, args=(mu,)).sum() / n
print(variance)Total 100 samples in dataset 0.010566139516163857 0.0022171923333002158
Volatility
Volatility (σ - sigma) is as simple as square root of variance. Volatility is also known as 1st standard deviation of a normal distribution and we know that log returns are normally distributed.
\[ \textcolor{blue} { \sigma = \sqrt{\frac{\sum_{i=1}^{N} (r_i - \mu)^2}{N}} } \]
vol = np.sqrt(variance)
print(vol)
std = df['logr'].std()
print(std)0.04708707182762818 0.04732428779659303
Most of the time we will work with annualized volatility that is daily volatility * square root of trading days where trading days is 365 for crypto markets.
annualized_vol = vol * np.sqrt(365)
print(annualized_vol)0.8995972441346064
Rolling volatility
This is just the historical volatility average for a past rolling window, e.g. 7-day, 30-day, etc.
def volatility(w, n):
mu = w.mean()
variance = w.apply(squared_distance, args=(mu,)).sum() * 365 / n
return np.sqrt(variance)
df = df.assign(vol7day=df.logr.rolling(7).apply(volatility, args=(7,)))
df = df.assign(vol30day=df.logr.rolling(30).apply(volatility, args=(30,)))
print(df.loc[:, ['time', 'vol7day', 'vol30day']].tail())
time vol7day vol30day
96 2021-03-19 0.766401 0.815991
97 2021-03-20 0.583027 0.814464
98 2021-03-21 0.541779 0.768962
99 2021-03-22 0.545397 0.795786
100 2021-03-23 0.537617 0.792567
And here are the 7-day vs. 30-day rolling volatility graphs:
import matplotlib.pyplot as plt
filename = 'hv-rolling.png'
plt.figure(figsize=(8, 6))
df[-60:].plot(x='time', y=['vol7day','vol30day'])
plt.savefig(filename)
filename
Volatility models
Close-close
Close-close historical volatility model is quite similar to classic model calculated above with 2 main differences:
- we assume mean = 0, here no distance from the mean sub, only the squared log returns
- we calculate annualized volatility, mind the 365 term under the square root
\[ \textcolor{blue} { \sigma_{cc} = \sqrt{\frac{\sum_{i=1}^{N} \ln{\frac{r_i}{r_{i-1}}}^2 * 365 }{N}} } \]
def squared_log(r):
return r**2
def closeclose(w, n):
var = w.apply(squared_log).sum() * 365 / n
return np.sqrt(var)
df = df.assign(cc30day=df.logr.rolling(30).apply(closeclose, args=(30,)))
print(df.loc[:, ['time', 'logr', 'cc30day']].tail())
time logr cc30day
96 2021-03-19 0.007206 0.818845
97 2021-03-20 0.000708 0.817973
98 2021-03-21 -0.012581 0.769133
99 2021-03-22 -0.058859 0.796060
100 2021-03-23 0.019163 0.793004
Close-close vs. classic 30-day volatility, quite similar with the other one.
filename = 'hv-closeclose.png'
df[-60:].plot(x='time', y=['vol30day', 'cc30day'])
plt.savefig(filename)
filename
Parkinson
Close-close model uses today's close vs. yesterday's close and ignores a lot of intraday volatility but Parkinson model tries to solve the problem using high (hᵢ) and low (lᵢ) prices.
\[ \textcolor{blue} { \sigma_{pa} = \sqrt{\frac{\sum_{i=1}^{N} \ln{\frac{h_i}{l_i}}^2 * 365 }{N * 4 * \ln2}} } \]
def parkinson(w, n):
var = w.apply(squared_log).sum() * 365 / n * 4 * np.log(2)
return np.sqrt(var)
df = df.assign(hllogr=np.log(df.high / df.low))
df = df.assign(par30day=df.hllogr.rolling(30).apply(parkinson, args=(30,)))
print(df.loc[:, ['time', 'hllogr', 'par30day']].tail())
time hllogr par30day
96 2021-03-19 0.054792 2.729470
97 2021-03-20 0.035229 2.730649
98 2021-03-21 0.052275 2.680251
99 2021-03-22 0.082659 2.699996
100 2021-03-23 0.045587 2.697506
Parkinson vs. classic 30-day volatility, huge differences since high-low movements are larger than open-close.
filename = 'hv-parkinson.png'
df[-60:].plot(x='time', y=['vol30day', 'par30day'])
plt.savefig(filename)
filename
Garman-Klass
To improve the Parkinson model, GKs use both close-open and high-low prices.
\[ \textcolor{blue} { \sigma_{gk} = \sqrt{\frac{365}{N}} * \sqrt{\sum_{i=1}^{N} \frac{\ln{\frac{h_i}{l_i}}^2}{2} - (2*\ln2-1) * \sum_{i=N}^{N} \ln{\frac{c_i}{o_i}}^2 } } \]
Rogers-Satchel
Then comes RS and
\[ \textcolor{blue} { \sigma_{rs} = \sqrt{\frac{365}{N}} * \sqrt{\sum_{i=N}^{N} \ln{\frac{h_i}{c_i}} \ln{\frac{h_i}{o_i}} + \ln{\frac{l_i}{c_i}} \ln{\frac{l_i}{o_i}} } } \]
Yang-Zang
And finally the YZ model that takes into account both jumps and drift.
\[ \textcolor{blue} { \sigma_{yz} = \sqrt{365} * \sqrt{ \sigma_{close-to-open}^2 + k*\sigma_{open-to-close}^2 + (1-k)* \sigma_{rs}^2 } } \]
where: \[ \textcolor{blue} { k = \frac{0.34}{1.34 + \frac{N+1}{N-1}} } \]
\[ \textcolor{blue} { \sigma_{close-to-open}^2 = \frac{1}{N-1} * \sum_{i=N}^{N} { [\ln(\frac{o_i}{c_{i-1}})-\overline{\ln(\frac{o_i}{c_{i-1}})}]^2 } } \]
\[ \textcolor{blue} { \sigma_{open-to-close}^2 = \frac{1}{N-1} * \sum_{i=N}^{N} { [\ln(\frac{c_i}{o_i})-\overline{\ln(\frac{o_i}{c_i})}]^2 } } \]
Daunting huh? Not really, just formulas with multiple terms for a more accurate estimation, the underlying volatility concepts stay the same.
References
- https://www.investopedia.com/terms/h/historicalvolatility.asp
- https://www.wallstreetmojo.com/variance-vs-standard-deviation/
- https://www.wallstreetmojo.com/realized-volatility/
- https://www.macrodesiac.com/your-volatility-handbook/
- https://medium.com/swlh/the-realized-volatility-puzzle-588a74ab3896
- https://dynamiproject.files.wordpress.com/2016/01/measuring_historic_volatility.pdf