Network Design & Training
Network Design &  Training Issues   Design: Architecture of network Structure of artificial neurons Learning rules  Training: Ensuring optimum training Learning parameters Data preparation and more ....
Network Design
Network Design Architecture of the network:  How many nodes? Determines number of network weights How many layers?  How many nodes per layer? Input Layer  Hidden Layer  Output Layer Automated methods:  augmentation (cascade correlation) weight pruning and elimination
Network Design Architecture of the network:  Connectivity? Concept of model or  hypothesis  space Constraining the number of hypotheses: selective connectivity shared weights recursive connections
Network Design Structure of artificial neuron nodes Choice of input integration: summed, squared and summed multiplied Choice of activation (transfer) function: sigmoid (logistic) hyperbolic tangent Gaussian linear soft-max
Network Design Selecting a Learning Rule  Generalized delta rule  (steepest descent) Momentum descent Advanced weight space search techniques Global Error function can also vary - normal  - quadratic  - cubic
Network Training
Network Training How do you ensure that a network has been well trained? Objective:  To achieve good generalization   accuracy on new examples/cases  Establish a maximum acceptable error rate  Train the network using a  validation test set  to tune it Validate the trained network against a separate test set which is usually referred to as a  production test set
Network Training Approach #1:  Large Sample When the amount of available data is large ... Available  Examples Training Set Production Set 70% 30% Used to develop one ANN model Compute Test error Divide randomly Generalization error = test error Test Set
Network Training Approach #2:  Cross-validation When the amount of available data is small ... Available Examples Training Set Pro. Set 10% 90% Repeat 10 times Used to develop 10 different ANN models Accumulate test errors Generalization error determined by mean test error and stddev Test Set
Network Training How do you select between two ANN designs ?  A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models If  Large Sample  method has been used then apply  McNemar’s test* If  Cross-validation  then use a  paired  t  test  for difference of two proportions *We assume a classification problem, if this is function  approximation then use paired  t  test for difference of  means
Network Training Mastering ANN Parameters Typical   Range learning rate -  0.1  0.01 - 0.99 momentum -  0.8  0.1 - 0.9 weight-cost -  0.1  0.001 - 0.5 Fine tuning :  -   adjust individual parameters at each node and/or connection weight automatic adjustment during training
Network Training Network weight initialization Random initial values  +/- some range Smaller weight values for nodes with many incoming connections Rule of thumb:  initial weight range should be approximately coming into a node
Network Training Typical Problems During Training E # iter E # iter E # iter Would like: But sometimes: Steady, rapid decline in total error Seldom a local minimum -  reduce learning or momentum parameter Reduce learning parms. - may indicate data is not learnable
Data Preparation
Data Preparation Garbage in  Garbage out  The quality of results relates directly to quality of the data 50%-70% of ANN development time will be spent on data preparation The three steps of data preparation: Consolidation and Cleaning Selection and Preprocessing Transformation and Encoding
Data Preparation Data Types and ANNs Three basic data types: nominal  discrete symbolic ( A, yes, small ) ordinal  discrete numeric (-5, 3, 24) continuous  numeric (0.23, -45.2, 500.43)  bp ANNs accept only continuous numeric values  (typically 0 - 1 range)
Data Preparation Consolidation and Cleaning Determine appropriate input attributes  Consolidate data into working database Eliminate or estimate missing values Remove  outliers  (obvious exceptions) Determine prior probabilities of categories and deal with  volume bias
Data Preparation Selection and Preprocessing Select examples  random sampling Consider number of training examples?  Reduce attribute dimensionality remove redundant and/or correlating attributes combine attributes (sum, multiply, difference) Reduce attribute value ranges group symbolic discrete values quantize continuous numeric values
Data Preparation Transformation and Encoding Discrete symbolic or numeric values Transform to discrete numeric values Encode the value  4  as follows: one-of-N code ( 0 1 0 0 0 ) - five inputs thermometer code (  1 1 1 1 0 ) - five inputs real value ( 0.4 )*  - one input Consider relationship between values ( single, married, divorce )  vs.  ( youth, adult, senior ) * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
Data Preparation Transformation and Encoding Continuous numeric values De-correlate example attributes via normalization of values: Euclidean:  n = x/sqrt(sum of all x^2) Percentage:  n =  x/(sum of all x) Variance based:  n = (x - (mean of all x))/variance Scale values  using a linear transform if data is uniformly distributed or use non-linear (log, power) if skewed distribution
Data Preparation Transformation and Encoding Continuous numeric values Encode the value  1.6   as: Single real-valued number ( 0.16 )* -  OK! Bits of a binary number ( 010000 ) -  BAD! one-of-N quantized intervals ( 0 1 0 0 0 )  -  NOT GREAT! - discontinuities distributed (fuzzy) overlapping intervals  (  0.3 0.8 0.1 0.0 0.0 ) -  BEST! * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
TUTORIAL #5 Develop and train a BP network on real-world data
Post-Training Analysis
Post-Training Analysis Examining the neural net model: Visualizing the constructed model Detailed network analysis Sensitivity analysis of input attributes: Analytical techniques  Attribute elimination
Post-Training Analysis Visualizing the Constructed Model Graphical tools can be used to display output response as selected input variables are changed Response Size Temp
Post-Training Analysis Detailed network analysis Hidden nodes form internal representation Manual analysis of weight values often difficult - graphics very helpful Conversion to equation, executable code Automated ANN to symbolic logic conversion is a hot area of research
Post-Training Analysis Sensitivity analysis of input attributes Analytical techniques  factor analysis network weight analysis Feature (attribute) elimination forward feature elimination backward feature elimination
The ANN Application Development Process Guidelines for using neural networks 1.  Try the best existing method first 2.  Get a  big   training set 3.  Try a net without hidden units 4.  Use a sensible coding for input variables 5.  Consider methods of constraining network 6.  Use a test set to prevent over-training 7.  Determine confidence in generalization through cross-validation
Example Applications Pattern Recognition  (reading zip codes) Signal Filtering  (reduction of radio noise) Data Segmentation  (detection of seismic onsets) Data Compression  (TV image transmission) Database Mining  (marketing, finance analysis) Adaptive Control  (vehicle guidance)
Pros and Cons of Back-Prop
Pros and Cons  of Back-Prop   Cons: Local minimum - but not generally a concern Seems biologically implausible Space and time complexity:   lengthy training times It’s a black box!  I can’t see how it’s making decisions? Best suited for supervised learning Works poorly on dense data with few input variables
Pros and Cons  of Back-Prop   Pros: Proven training method for multi-layer nets Able to learn any arbitrary function ( XOR ) Most useful for non-linear mappings Works well with noisy data Generalizes well given sufficient examples Rapid recognition speed Has inspired many new learning algorithms
Other Networks and  Advanced Issues
Other Networks and  Advanced Issues Variations in feed-forward architecture jump connections to output nodes hidden nodes that vary in structure Recurrent networks with feedback connections Probabilistic networks General Regression networks Unsupervised self-organizing networks
THE END Thanks for your participation!
 

NEURAL Network Design Training

  • 1.
  • 2.
    Network Design & Training Issues Design: Architecture of network Structure of artificial neurons Learning rules Training: Ensuring optimum training Learning parameters Data preparation and more ....
  • 3.
  • 4.
    Network Design Architectureof the network: How many nodes? Determines number of network weights How many layers? How many nodes per layer? Input Layer Hidden Layer Output Layer Automated methods: augmentation (cascade correlation) weight pruning and elimination
  • 5.
    Network Design Architectureof the network: Connectivity? Concept of model or hypothesis space Constraining the number of hypotheses: selective connectivity shared weights recursive connections
  • 6.
    Network Design Structureof artificial neuron nodes Choice of input integration: summed, squared and summed multiplied Choice of activation (transfer) function: sigmoid (logistic) hyperbolic tangent Gaussian linear soft-max
  • 7.
    Network Design Selectinga Learning Rule Generalized delta rule (steepest descent) Momentum descent Advanced weight space search techniques Global Error function can also vary - normal - quadratic - cubic
  • 8.
  • 9.
    Network Training Howdo you ensure that a network has been well trained? Objective: To achieve good generalization accuracy on new examples/cases Establish a maximum acceptable error rate Train the network using a validation test set to tune it Validate the trained network against a separate test set which is usually referred to as a production test set
  • 10.
    Network Training Approach#1: Large Sample When the amount of available data is large ... Available Examples Training Set Production Set 70% 30% Used to develop one ANN model Compute Test error Divide randomly Generalization error = test error Test Set
  • 11.
    Network Training Approach#2: Cross-validation When the amount of available data is small ... Available Examples Training Set Pro. Set 10% 90% Repeat 10 times Used to develop 10 different ANN models Accumulate test errors Generalization error determined by mean test error and stddev Test Set
  • 12.
    Network Training Howdo you select between two ANN designs ? A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models If Large Sample method has been used then apply McNemar’s test* If Cross-validation then use a paired t test for difference of two proportions *We assume a classification problem, if this is function approximation then use paired t test for difference of means
  • 13.
    Network Training MasteringANN Parameters Typical Range learning rate - 0.1 0.01 - 0.99 momentum - 0.8 0.1 - 0.9 weight-cost - 0.1 0.001 - 0.5 Fine tuning : - adjust individual parameters at each node and/or connection weight automatic adjustment during training
  • 14.
    Network Training Networkweight initialization Random initial values +/- some range Smaller weight values for nodes with many incoming connections Rule of thumb: initial weight range should be approximately coming into a node
  • 15.
    Network Training TypicalProblems During Training E # iter E # iter E # iter Would like: But sometimes: Steady, rapid decline in total error Seldom a local minimum - reduce learning or momentum parameter Reduce learning parms. - may indicate data is not learnable
  • 16.
  • 17.
    Data Preparation Garbagein Garbage out The quality of results relates directly to quality of the data 50%-70% of ANN development time will be spent on data preparation The three steps of data preparation: Consolidation and Cleaning Selection and Preprocessing Transformation and Encoding
  • 18.
    Data Preparation DataTypes and ANNs Three basic data types: nominal discrete symbolic ( A, yes, small ) ordinal discrete numeric (-5, 3, 24) continuous numeric (0.23, -45.2, 500.43) bp ANNs accept only continuous numeric values (typically 0 - 1 range)
  • 19.
    Data Preparation Consolidationand Cleaning Determine appropriate input attributes Consolidate data into working database Eliminate or estimate missing values Remove outliers (obvious exceptions) Determine prior probabilities of categories and deal with volume bias
  • 20.
    Data Preparation Selectionand Preprocessing Select examples random sampling Consider number of training examples? Reduce attribute dimensionality remove redundant and/or correlating attributes combine attributes (sum, multiply, difference) Reduce attribute value ranges group symbolic discrete values quantize continuous numeric values
  • 21.
    Data Preparation Transformationand Encoding Discrete symbolic or numeric values Transform to discrete numeric values Encode the value 4 as follows: one-of-N code ( 0 1 0 0 0 ) - five inputs thermometer code ( 1 1 1 1 0 ) - five inputs real value ( 0.4 )* - one input Consider relationship between values ( single, married, divorce ) vs. ( youth, adult, senior ) * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
  • 22.
    Data Preparation Transformationand Encoding Continuous numeric values De-correlate example attributes via normalization of values: Euclidean: n = x/sqrt(sum of all x^2) Percentage: n = x/(sum of all x) Variance based: n = (x - (mean of all x))/variance Scale values using a linear transform if data is uniformly distributed or use non-linear (log, power) if skewed distribution
  • 23.
    Data Preparation Transformationand Encoding Continuous numeric values Encode the value 1.6 as: Single real-valued number ( 0.16 )* - OK! Bits of a binary number ( 010000 ) - BAD! one-of-N quantized intervals ( 0 1 0 0 0 ) - NOT GREAT! - discontinuities distributed (fuzzy) overlapping intervals ( 0.3 0.8 0.1 0.0 0.0 ) - BEST! * Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
  • 24.
    TUTORIAL #5 Developand train a BP network on real-world data
  • 25.
  • 26.
    Post-Training Analysis Examiningthe neural net model: Visualizing the constructed model Detailed network analysis Sensitivity analysis of input attributes: Analytical techniques Attribute elimination
  • 27.
    Post-Training Analysis Visualizingthe Constructed Model Graphical tools can be used to display output response as selected input variables are changed Response Size Temp
  • 28.
    Post-Training Analysis Detailednetwork analysis Hidden nodes form internal representation Manual analysis of weight values often difficult - graphics very helpful Conversion to equation, executable code Automated ANN to symbolic logic conversion is a hot area of research
  • 29.
    Post-Training Analysis Sensitivityanalysis of input attributes Analytical techniques factor analysis network weight analysis Feature (attribute) elimination forward feature elimination backward feature elimination
  • 30.
    The ANN ApplicationDevelopment Process Guidelines for using neural networks 1. Try the best existing method first 2. Get a big training set 3. Try a net without hidden units 4. Use a sensible coding for input variables 5. Consider methods of constraining network 6. Use a test set to prevent over-training 7. Determine confidence in generalization through cross-validation
  • 31.
    Example Applications PatternRecognition (reading zip codes) Signal Filtering (reduction of radio noise) Data Segmentation (detection of seismic onsets) Data Compression (TV image transmission) Database Mining (marketing, finance analysis) Adaptive Control (vehicle guidance)
  • 32.
    Pros and Consof Back-Prop
  • 33.
    Pros and Cons of Back-Prop Cons: Local minimum - but not generally a concern Seems biologically implausible Space and time complexity: lengthy training times It’s a black box! I can’t see how it’s making decisions? Best suited for supervised learning Works poorly on dense data with few input variables
  • 34.
    Pros and Cons of Back-Prop Pros: Proven training method for multi-layer nets Able to learn any arbitrary function ( XOR ) Most useful for non-linear mappings Works well with noisy data Generalizes well given sufficient examples Rapid recognition speed Has inspired many new learning algorithms
  • 35.
    Other Networks and Advanced Issues
  • 36.
    Other Networks and Advanced Issues Variations in feed-forward architecture jump connections to output nodes hidden nodes that vary in structure Recurrent networks with feedback connections Probabilistic networks General Regression networks Unsupervised self-organizing networks
  • 37.
    THE END Thanksfor your participation!
  • 38.