Matrix Completion Problem using NExOS.jl¶
Shuvomoy Das Gupta
A matrix completion problem can be formulated as the following optimization problem:
$$ \begin{array}{ll} \textrm{minimize} & \sum_{(i,j)\in\Omega}(X_{ij}-Z_{ij})^{2}\\ \textrm{subject to} & \mathop{{\bf rank}}(X)\leq r\\ & \|X\|_{2}\leq M\\ & X\in\mathbf{R}^{m\times n}, \end{array} $$
where $Z\in\mathbf{R}^{m\times n}$ is the matrix whose entries $Z_{ij}$ are observable for $(i,j)\in\Omega$. Based on these observed entries, our goal is to construct a matrix $X\in\mathbf{R}^{m\times n}$ that has rank $r$.
First, of course, we load our packages.
using Random, LinearAlgebra, NExOS
Random.seed!(1234)
MersenneTwister(UInt32[0x000004d2], Random.DSFMT.DSFMT_state(Int32[-1393240018, 1073611148, 45497681, 1072875908, 436273599, 1073674613, -2043716458, 1073445557, -254908435, 1072827086 … -599655111, 1073144102, 367655457, 1072985259, -1278750689, 1018350124, -597141475, 249849711, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000 … 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000], 1002, 0)
Construct a random $m\times n$ matrix matrix of rank $k$.
m,n,k = 40,40,5
Zfull = randn(m,k)*randn(k,n) # ground truth data
Zfull = Zfull/opnorm(Zfull,2) # scale the matrix so its operator norm is 1
M = opnorm(Zfull,2)
1.0000000000000002
Suppose that we only observe a fraction of entries in Zfull. Let us find the indices of all the elements that are available.
n_obs = 600
Zobs = fill(NaN,(m,n))
obs = randperm(m*n)[1:n_obs]
Zobs[obs] .= Zfull[obs] # partially observed matrix
600-element view(::Array{Float64,1}, [314, 484, 770, 1462, 495, 602, 637, 1284, 358, 1510 … 1159, 371, 157, 537, 589, 533, 221, 1025, 564, 529]) with eltype Float64:
0.04614085153216508
0.029552887430458176
-0.0184886116014581
-0.04597613170772224
0.06158831071378109
0.04359079928175278
-0.014791813279264342
0.006936983760815719
0.01650141696962079
-0.014561284205705108
0.0251527010880924
0.008374291722196085
-0.00335913745738921
⋮
-0.008553749580091464
-0.010747230035676235
-0.01914100968377146
-0.055173021978315376
-0.16168361561961594
-0.02789400226183489
-0.02613098308987431
-0.010856985785824986
0.01568346928771265
-0.025649452543356512
-0.003523186151803044
0.022307043854383427
Plot the ground-truth, full dataset and our partial observations
using PyPlot
Plot the ground-truth, full dataset and our partial observations
figure(figsize=(7,14))
subplot(1,2,1)
imshow(Zfull,cmap=ColorMap("Blues"),interpolation="None")
xticks([]),yticks([]),title("True Data",fontweight="bold")
subplot(1,2,2)
imshow(Zobs,cmap=ColorMap("Blues"),interpolation="None")
xticks([]),yticks([]),title("Observed Data",fontweight="bold")
show()
PyPlot.display_figs()
Let us create our problem now:
M = opnorm(Zfull,2)
f = SquaredLossMatrixCompletion(Zobs, iterative = true)
r = rank(Zfull)
Z0 = zeros(size(Zobs))
C = RankSet(M, r)
settings = Settings(μ_max = 5, μ_mult_fact = 0.5, n_iter_min = 1000, n_iter_max = 1000, verbose = false, freq = 1000, tol = 1e-4, γ_updt_rule = :safe)
problem = NExOS.Problem(f, C, settings.β, Z0)
Problem{ProximalOperators.LeastSquaresIterative{1,Float64,Float64,SparseArrays.SparseMatrixCSC{Float64,Int64},Array{Float64,1},ProximalOperators.AAc},RankSet{Float64,Int64},Float64,Array{Float64,2}}(description : Least squares penalty
domain : n/a
expression : n/a
parameters : n/a, RankSet{Float64,Int64}(1.0000000000000002, 5), 1.0e-10, [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0])
Time to solve our problem...
state_final = solve!(problem, settings)
Z_estimated = state_final.x
40×40 Array{Float64,2}:
0.00921175 -0.0038448 -0.00129091 … -0.00447769 -0.00843496
-0.0147354 0.0011719 -0.000635124 0.00706023 0.00987895
-0.0444565 -0.00207041 -0.0177526 0.0141985 0.00632431
-0.0026325 0.00607353 0.0049039 -0.00903814 0.00128217
0.0112686 0.0109044 -0.0102908 0.00818285 -0.0127803
0.0291193 -0.00187681 0.000716274 … 0.013873 -0.00132453
-0.0028306 -0.0188753 -0.0163884 0.0218045 -0.00466638
-0.00147231 0.0134306 0.0115974 -0.00983515 0.00878667
0.0105054 0.00433151 0.00609612 0.00531648 0.00636103
-0.0051754 0.0209462 0.0130169 -0.0116038 0.0111824
-0.0119 -0.000936404 -0.0142817 … 0.00737043 -0.00860866
-0.0144667 -0.00838286 -0.0160724 0.0259919 0.00282836
-0.00400986 0.0017526 -0.00431504 0.019474 0.00892805
⋮ ⋱
0.0347436 0.0353489 0.0538003 -0.0787362 0.00203755
-0.00400174 0.0174897 0.0194419 -0.0309039 0.00560616
-0.0107255 -0.0128399 -0.000264674 … 0.0154012 0.0140037
0.0289964 0.00806107 0.0227597 -0.0375122 -0.00791514
-0.00530796 0.00876034 0.00745653 -0.00555373 0.00819782
-0.0171454 -0.0176783 -0.0123308 0.0109885 -0.000278122
0.00567607 0.00238189 0.00376122 0.00607806 0.00619147
-0.00492816 -0.0150325 -0.013713 … 0.0200635 -0.00200861
0.0283517 0.0228515 0.0230801 0.00278105 0.0182905
-0.0135796 0.00143811 -0.0111562 0.0100748 -0.00156238
-0.00264522 0.00379569 0.0072004 -0.000408534 0.00941569
-0.0167782 -0.00217151 -0.0046546 0.00407195 0.00347602
Finally, we plot a simple histogram to see how much of the matrix has been recovered.
figure(figsize=(8,3))
PyPlot.hist(vec(Zfull - Z_estimated ),100)
xlim([-0.5,0.5]),xlabel("Absolute Errors/Residuals",fontweight="bold"),tight_layout()
show()
PyPlot.display_figs()
So, NExOS does a good job!