12

Pandas pairwise correlation on a DataFrame comes handy in many cases. However, in my specific case I would like to use a method not provided by Pandas (something other than (pearson, kendall or spearman) to correlate two columns. Is it possible to explicitly define the correlation function to use in this case?

The syntax I would like looks like this:

def my_method(x,y): return something
frame.corr(method=my_method)
4
  • can you give an example of what your method is? Commented Aug 14, 2013 at 14:30
  • It doesn't really matter. Given two series x and y it returns a coefficient in [0,1] indicating the correlation between the two variables just like Spearman does. Commented Aug 14, 2013 at 14:36
  • Not an issue for the question, but Spearman's rank correlation returns a coefficient in [-1, 1]. Commented Aug 14, 2013 at 21:40
  • Besides doing it in cython as Jeff mentions, you could also consider numpy or numba for speed Commented Feb 2, 2018 at 16:56

3 Answers 3

2

You would need to do this in cython for any kind of perf (with a cythonizable function)

l = len(df.columns)
results = np.zeros((l,l))
for i, ac in enumerate(df):
    for j, bc in enumerate(df):
           results[j,i] = func(ac,bc)
results = DataFrame(results,index=df.columns,columns=df.columns)
Sign up to request clarification or add additional context in comments.

Comments

0

Check out the documentation for DataFrame.corr()

Parameters
----------
    method : {'pearson', 'kendall', 'spearman'} or callable
        * pearson : standard correlation coefficient
        * kendall : Kendall Tau correlation coefficient
        * spearman : Spearman rank correlation
        * callable: callable with input two 1d ndarrays
            and returning a float. Note that the returned matrix from corr
            will have 1 along the diagonals and will be symmetric
            regardless of the callable's behavior
            .. versionadded:: 0.24.0

Check out also DataFrame.corrwith()

Warning: This calculates a symmetric correlation matrix, eg. CramrsV, but this method is not suitable for TheilsU and other asymmetric corr matrix.

Comments

0
def spearman_rank_pandas(rank_series1: np.ndarray, rank_series2: np.ndarray):
    if np.isnan(rank_series1).all() or np.isnan(rank_series2).all():
        return np.nan
    
    rank_diff = rank_series1 - rank_series2
    
    top = 6 * ((rank_diff**2).sum())
    bottom = len(rank_diff) * (len(rank_diff)**2 - 1)

    rho = 1 - (top/bottom)

    assert ((rho >= -1) and (rho <= 1)), "Error in your stats"
    return rho
frame = frame[["x1", "x2", "y"]]
def my_method(frame): return something
    return frame.corr(method=spearman_rank_pandas)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.