Yi Wu (CMU)Joint work with ParikshitGopalan (MSR SVC) Ryan O’Donnell (CMU)David Zuckerman (UT Austin)Pseudorandom Generators for HalfspacesTexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA
OutlineIntroductionPseudorandom GeneratorsHalfspacesPseudorandom Generators for HalfspacesOur ResultProofConclusion2
Deterministic AlgorithmProgramInputOutputThe algorithm deterministically outputs the correct result.3
Randomized AlgorithmProgramInputOutputRandom Bits.The algorithm outputs the correct result with high probability.4
Primality testing  ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing  ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing  ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing  ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresRandomized Algorithms5
Is Randomness Necessary?Open Problem: Can we simulate every randomized polynomial time algorithm by a deterministic polynomial time algorithm (the “BPP       P” cojecture)?  Derandomization of  randomized algorithms.Primality testing   [AKS]ST-connectivity   [Reingold]Quadratic  residues [?]6
How to generate randomness?Question: How togenerate randomness for every randomized algorithm?Simpler Question: How to generate “pseudorandomness” for some class of programs?7
Pseudorandom Generator (PRG)Both program Answer  Yes/No with almost the  same probabilityYes /NoYes/ NoInputProgram InputProgramn  “pseudorandom” bit PRG Quality of the PRG: number of seedn random bit Seedk<<n  random bit 8
Why study PRGs?Algorithmic ApplicationsWhen k = log (n),  we can derandomize the algorithm in polynomial time.Streaming Algorithm.Complexity Theoretic ImplicationsLower Bound of Circuit Class.Learning Theory.9
PRG for Classes of ProgramSpace Bounded Program [Nis92]Constant-depth circuits [Nis91, Baz07, Bra09] Halfspaces[DGJSV09, MZ09]10
OutlineIntroductionPseudorandom GeneratorsHalfspacesPseudorandom Generators for HalfspacesOur ResultProofConclusion11
Halfspaces------+++--+++-++--++++Halfspaces: Boolean functions h:Rn -> {-1,1} of the form h(x) = sgn(w1x1+…+wnxn- θ)   where w1,…, wn,θ R. Well-studied in complexity theory
Widely used in Machine Learning: Perceptron, Winnow, boosting, Support Vector Machines, Lasso, Liner Regression.12
Product DistributionFor  halfspace h(x), x is sampled from some product distribution; i.e.,  each xi is independently sampled from distribution Di .	 For example, each Dican be Uniform distribution on {-1,1}Uniform distribution on [-1,1]Gaussian Distribution13
Index	IntroductionPseudorandom GeneratorsHalfspacesPseudorandom Generators for HalfspacesMain ResultProofConclusion14
PRG for halfspacesBoth program Answer  Yes/No with almost the  same probabilityYes/NoYes/Noh(x) = sign(w1x1+…+wnxn-θ)h(x) = sign(w1x1+…+wnxn-θ)Pseudorandom Variable x1, x2 …xnPRG x1, x2 …xnfrom some product distributionk<<n  random bit 15
Geometric Interpretation, PRG for uniform distribution over [-1,1]216
Geometric Interpretation, PRG for uniform distribution over [-1,1]2Total Number of points = poly(dim)Number of points in the halfspace is proportional to area.17
Application to Machine Learning------+++--+++-++--++++How many testing points is it enough  to estimate the accuracy of the  N dimensional linear classifier?Good PRG implies we only need deterministically check the accuracy on a set of poly(N) points!18
Other Theoretical ApplicationsDiscrepancy Set for  Convex PolytopesCircuit  Lower bound on functions of halfspacesCounting the Solution of Knapsacks19
Outline	IntroductionPseudorandom GeneratorHalfspacePseudorandom Generators for HalfspacesOur ResultsProofConclusion20
Previous Result	[DiGoJaSeVi,MeZu] PRG For Halfspace over uniform distribution on boolean cube ({-1,1}n) with seed length O(log n).21
Our Results:Arbitrary Product DistributionsPRG for halfspaces under arbitrary product distribution over Rnwith the same seed length.Only requirement: E[xi4] is a constant.Gaussian DistributionUniform distribution on the solid cube.Uniform distribution on the hypercube.Biased distribution on the hypercube.Almost any “natural distribution”22
Our Results Functions of k-HalfspacesPRG for the intersections of k-halfspaces with seed length    k log (n). PRG for  arbitrary functions of k-halfspaces with seed length k2 log (n). 23
OutlineIntroductionPseudorandom GeneratorHalfspacePseudorandom Generators for HalfspacesOur ResultProofConclusion24
Key Observation: Dichotomy of HalfspacesUnder product distributions , every halfspace is close to one of the following: “Dictator”  (halfspaces depending on very few variables, e.g. f(x) = sgn(x1)) “majority”(no variables has too much weight, e.g. f(x) = sgn(x1+x2+x3+…+xn).25
Dichotomy of weight distributionWeights decreasing fast (Geometrically)Weights are  stable after certain index.26
Weights Decrease fast (Geometrically)Intuition: for sign(2n x1 + 2n-1 x2+ 2n-2 x3 +…xn)If  each xi is from {-1,1} , it is just sign(x1). 27
Weights are stableIntuition:   for  sign(100 x1 + x2 + x3+…xn) Then  by for every fixing of x1,  it is a majority on the rest of the variables.28
Our PRG for  Halfspace (Rough)Randomly hashing all the coordinate into groups. Use 4-wise independent distribution within each group. If it is “Dictator-like”: All the important variables are in different groups.If it is “Majority-like”(x1 + x2 +.. xn )  is close to Gaussian. 4-wise independent Distribution  (somehow) can handle Gaussian.29
OutlineIntroductionPseudorandom GeneratorHalfspacePseudorandom Generators for HalfspacesOur ResultProofConclusion30
ConclusionWe construct PRG for halfspaces under arbitrary product distribution and functions  of k halfspaces with small seed length.Future WorkBuilding PRG for larger classes of program;  e.g., Polynomial Threshold function (SVM with polynomial kernel).31

pptx - Psuedo Random Generator for Halfspaces

  • 1.
    Yi Wu (CMU)Jointwork with ParikshitGopalan (MSR SVC) Ryan O’Donnell (CMU)David Zuckerman (UT Austin)Pseudorandom Generators for HalfspacesTexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA
  • 2.
  • 3.
    Deterministic AlgorithmProgramInputOutputThe algorithmdeterministically outputs the correct result.3
  • 4.
    Randomized AlgorithmProgramInputOutputRandom Bits.Thealgorithm outputs the correct result with high probability.4
  • 5.
    Primality testing ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresPrimality testing ST-connectivityOrder statistics SearchingPolynomial and matrix identity verificationInteractive proof systemsFaster algorithms for linear programmingRounding linear program solutions to integerMinimum spanning trees shortest paths minimum cutsCounting and enumerationMatrix permanent Counting combinatorial structuresRandomized Algorithms5
  • 6.
    Is Randomness Necessary?OpenProblem: Can we simulate every randomized polynomial time algorithm by a deterministic polynomial time algorithm (the “BPP P” cojecture)? Derandomization of randomized algorithms.Primality testing [AKS]ST-connectivity [Reingold]Quadratic residues [?]6
  • 7.
    How to generaterandomness?Question: How togenerate randomness for every randomized algorithm?Simpler Question: How to generate “pseudorandomness” for some class of programs?7
  • 8.
    Pseudorandom Generator (PRG)Bothprogram Answer Yes/No with almost the same probabilityYes /NoYes/ NoInputProgram InputProgramn “pseudorandom” bit PRG Quality of the PRG: number of seedn random bit Seedk<<n random bit 8
  • 9.
    Why study PRGs?AlgorithmicApplicationsWhen k = log (n), we can derandomize the algorithm in polynomial time.Streaming Algorithm.Complexity Theoretic ImplicationsLower Bound of Circuit Class.Learning Theory.9
  • 10.
    PRG for Classesof ProgramSpace Bounded Program [Nis92]Constant-depth circuits [Nis91, Baz07, Bra09] Halfspaces[DGJSV09, MZ09]10
  • 11.
  • 12.
    Halfspaces------+++--+++-++--++++Halfspaces: Boolean functionsh:Rn -> {-1,1} of the form h(x) = sgn(w1x1+…+wnxn- θ) where w1,…, wn,θ R. Well-studied in complexity theory
  • 13.
    Widely used inMachine Learning: Perceptron, Winnow, boosting, Support Vector Machines, Lasso, Liner Regression.12
  • 14.
    Product DistributionFor halfspace h(x), x is sampled from some product distribution; i.e., each xi is independently sampled from distribution Di . For example, each Dican be Uniform distribution on {-1,1}Uniform distribution on [-1,1]Gaussian Distribution13
  • 15.
  • 16.
    PRG for halfspacesBothprogram Answer Yes/No with almost the same probabilityYes/NoYes/Noh(x) = sign(w1x1+…+wnxn-θ)h(x) = sign(w1x1+…+wnxn-θ)Pseudorandom Variable x1, x2 …xnPRG x1, x2 …xnfrom some product distributionk<<n random bit 15
  • 17.
    Geometric Interpretation, PRGfor uniform distribution over [-1,1]216
  • 18.
    Geometric Interpretation, PRGfor uniform distribution over [-1,1]2Total Number of points = poly(dim)Number of points in the halfspace is proportional to area.17
  • 19.
    Application to MachineLearning------+++--+++-++--++++How many testing points is it enough to estimate the accuracy of the N dimensional linear classifier?Good PRG implies we only need deterministically check the accuracy on a set of poly(N) points!18
  • 20.
    Other Theoretical ApplicationsDiscrepancySet for Convex PolytopesCircuit Lower bound on functions of halfspacesCounting the Solution of Knapsacks19
  • 21.
  • 22.
    Previous Result [DiGoJaSeVi,MeZu] PRGFor Halfspace over uniform distribution on boolean cube ({-1,1}n) with seed length O(log n).21
  • 23.
    Our Results:Arbitrary ProductDistributionsPRG for halfspaces under arbitrary product distribution over Rnwith the same seed length.Only requirement: E[xi4] is a constant.Gaussian DistributionUniform distribution on the solid cube.Uniform distribution on the hypercube.Biased distribution on the hypercube.Almost any “natural distribution”22
  • 24.
    Our Results Functionsof k-HalfspacesPRG for the intersections of k-halfspaces with seed length k log (n). PRG for arbitrary functions of k-halfspaces with seed length k2 log (n). 23
  • 25.
  • 26.
    Key Observation: Dichotomyof HalfspacesUnder product distributions , every halfspace is close to one of the following: “Dictator” (halfspaces depending on very few variables, e.g. f(x) = sgn(x1)) “majority”(no variables has too much weight, e.g. f(x) = sgn(x1+x2+x3+…+xn).25
  • 27.
    Dichotomy of weightdistributionWeights decreasing fast (Geometrically)Weights are stable after certain index.26
  • 28.
    Weights Decrease fast(Geometrically)Intuition: for sign(2n x1 + 2n-1 x2+ 2n-2 x3 +…xn)If each xi is from {-1,1} , it is just sign(x1). 27
  • 29.
    Weights are stableIntuition: for sign(100 x1 + x2 + x3+…xn) Then by for every fixing of x1, it is a majority on the rest of the variables.28
  • 30.
    Our PRG for Halfspace (Rough)Randomly hashing all the coordinate into groups. Use 4-wise independent distribution within each group. If it is “Dictator-like”: All the important variables are in different groups.If it is “Majority-like”(x1 + x2 +.. xn ) is close to Gaussian. 4-wise independent Distribution (somehow) can handle Gaussian.29
  • 31.
  • 32.
    ConclusionWe construct PRGfor halfspaces under arbitrary product distribution and functions of k halfspaces with small seed length.Future WorkBuilding PRG for larger classes of program; e.g., Polynomial Threshold function (SVM with polynomial kernel).31
  • 33.

Editor's Notes

  • #8 Design randomized algorithm, how to generate randomness?The first question we can ask our self is This is pretty hard…The second eaisier
  • #24 Draw a dnf