I'm trying to relate a sample from a series with the series itself in a plot with this code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
# Create random walk data with a time series index
rnd_index = pd.date_range(start='2022-09-01', end='2022-09-30', freq='1D')
np.random.seed(0)
rnd_data = np.random.randint(-2, 4, size=(len(rnd_index))).cumsum()/10
series = pd.Series(rnd_data, index=rnd_index)
# Subsample fridays from the series
# Set up axis for plotting
ax = plt.gca()
series.plot(alpha=0.5, ax=ax)
sub_series = series.asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
plt.legend(['series', 'sub_series'], loc='upper left')
This ends up in the following plot:
One can barely see it, but the first data point from the sub_series is cut in half on the left side. Looking at the data one can see that the first Friday on the 2nd of September is in both series and sub_series with a value of 0.5.
1) Why is the sub_series data offset from series?
The second thing I don't understand here is when we reorder the plotting part in this way:
sub_series = series.copy(deep=True).asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
series.plot(alpha=0.5, ax=ax)
We end up with this plot:
Data is aligned now, but a lot of data is missing.
2) Why is the series data subsampled in the plot?
I guess this stems from a wrong understanding of the mechanism at work. To me it seems the pandas part is allright because the data output is like I would expect it (sub-sampled and correct values for the indicees). The behavior of matplotlib on the other hand I don't understand. Plotting series and sub_series individually gives the expected results. Combining them ends up in disaster (think about the wrong conclusions you would draw from misaligned indicees). Thank's for enligthning me. :-)
