0

I'm trying to relate a sample from a series with the series itself in a plot with this code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
# Create random walk data with a time series index
rnd_index = pd.date_range(start='2022-09-01', end='2022-09-30', freq='1D')
np.random.seed(0)
rnd_data = np.random.randint(-2, 4, size=(len(rnd_index))).cumsum()/10
series = pd.Series(rnd_data, index=rnd_index)
# Subsample fridays from the series
# Set up axis for plotting
ax = plt.gca()
series.plot(alpha=0.5, ax=ax)
sub_series = series.asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
plt.legend(['series', 'sub_series'], loc='upper left')

This ends up in the following plot:

off plot One can barely see it, but the first data point from the sub_series is cut in half on the left side. Looking at the data one can see that the first Friday on the 2nd of September is in both series and sub_series with a value of 0.5.

1) Why is the sub_series data offset from series?

The second thing I don't understand here is when we reorder the plotting part in this way:

sub_series = series.copy(deep=True).asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
series.plot(alpha=0.5, ax=ax)

We end up with this plot: missing data points Data is aligned now, but a lot of data is missing.

2) Why is the series data subsampled in the plot?

I guess this stems from a wrong understanding of the mechanism at work. To me it seems the pandas part is allright because the data output is like I would expect it (sub-sampled and correct values for the indicees). The behavior of matplotlib on the other hand I don't understand. Plotting series and sub_series individually gives the expected results. Combining them ends up in disaster (think about the wrong conclusions you would draw from misaligned indicees). Thank's for enligthning me. :-)

1 Answer 1

1

This seems to be an actual bug of pandas. If you check github, you'll find several (unsolved issues). https://github.com/pandas-dev/pandas/issues/11574 or here: https://github.com/pandas-dev/pandas/issues/29719

The problem does not occur if you use matplotlib directly as a workaround:

fig, ax = plt.subplots(figsize=(12, 7))
ax.plot(series.index, series.values)
ax.scatter(sub_series.index, sub_series.values,marker='+',c  = 'orange')
ax.set_xlim (np.datetime64('2022-08-30'), np.datetime64('2022-10-01') )
plt.legend(['series', 'sub_series'], loc='upper left')

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @Phillip Steiner. It's really unfortunate when one runs into such bugs in the very beginning. The tip to use matplotlib directly is probably the best way anyway as pandas plot method is just a convenience wrapper.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.