Why is my matplotlib plot either off or incomplete when plotting a pandas series?

Question

I'm trying to relate a sample from a series with the series itself in a plot with this code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
# Create random walk data with a time series index
rnd_index = pd.date_range(start='2022-09-01', end='2022-09-30', freq='1D')
np.random.seed(0)
rnd_data = np.random.randint(-2, 4, size=(len(rnd_index))).cumsum()/10
series = pd.Series(rnd_data, index=rnd_index)
# Subsample fridays from the series
# Set up axis for plotting
ax = plt.gca()
series.plot(alpha=0.5, ax=ax)
sub_series = series.asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
plt.legend(['series', 'sub_series'], loc='upper left')

This ends up in the following plot:

One can barely see it, but the first data point from the sub_series is cut in half on the left side. Looking at the data one can see that the first Friday on the 2nd of September is in both series and sub_series with a value of 0.5.

1) Why is the sub_series data offset from series?

The second thing I don't understand here is when we reorder the plotting part in this way:

sub_series = series.copy(deep=True).asfreq(freq="W-FRI")
sub_series.plot(alpha=0.5, style="+", ax=ax)
series.plot(alpha=0.5, ax=ax)

We end up with this plot: Data is aligned now, but a lot of data is missing.

2) Why is the series data subsampled in the plot?

I guess this stems from a wrong understanding of the mechanism at work. To me it seems the pandas part is allright because the data output is like I would expect it (sub-sampled and correct values for the indicees). The behavior of matplotlib on the other hand I don't understand. Plotting series and sub_series individually gives the expected results. Combining them ends up in disaster (think about the wrong conclusions you would draw from misaligned indicees). Thank's for enligthning me. :-)

Philipp Steiner · Accepted Answer · 2022-10-04 07:58:45Z

1

This seems to be an actual bug of pandas. If you check github, you'll find several (unsolved issues). https://github.com/pandas-dev/pandas/issues/11574 or here: https://github.com/pandas-dev/pandas/issues/29719

The problem does not occur if you use matplotlib directly as a workaround:

fig, ax = plt.subplots(figsize=(12, 7))
ax.plot(series.index, series.values)
ax.scatter(sub_series.index, sub_series.values,marker='+',c  = 'orange')
ax.set_xlim (np.datetime64('2022-08-30'), np.datetime64('2022-10-01') )
plt.legend(['series', 'sub_series'], loc='upper left')

answered Oct 4, 2022 at 7:58

Philipp Steiner

1611 silver badge14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pascal Over a year ago

Thanks @Phillip Steiner. It's really unfortunate when one runs into such bugs in the very beginning. The tip to use matplotlib directly is probably the best way anyway as pandas plot method is just a convenience wrapper.

Collectives™ on Stack Overflow

Why is my matplotlib plot either off or incomplete when plotting a pandas series?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related