From:
anon-392689
Views: 229
Comments: 0
Bayesian Field Theory ,texas public library, the library nightclub santa barbara, ann arbor library district, tcu library
From:
anon-389206
Views: 251
Comments: 0
Bayesian Nonparametrics ,turn your iphone into an e-book, huntinton library california, microsoft office 12.0 object library download, alien wars ebook by sam harrison
Slide 1: Ben Perry – M.S. Thesis Defense A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data
Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University
http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 2: Overview
• Bayesian Network
– Definitions and examples – Inference and learning
• •
Genetic Algorithms Structure Learning Background
– Problem – K2 algorithm – Sparse Candidate
•
Improving K2: Permutation Genetic Algorithm (GASLEAK)
– Shortcoming: greedy, sensitive to ordering – Permutation GA
•
Master’s thesis: Adjacency Matrix GA (SLAM GA)
– Rationale
• •
Evaluation with Known Bayesian Networks Summary
Kansas State University Department of Computing and Information Sciences
Ben Perry – M.S. thesis defense
Slide 3: Bayesian Belief Networks (BBNS): Definition
• Bayesian Network
– – – – Directed acyclic graph Vertices (nodes): denote events, or states of affairs (each a random variable) Edges (arcs, links): denote conditional dependencies, causalities Model of conditional dependence assertions (or CI assumptions)
•
Example (“Ben’s Presentation” BBN)
Sleep: Narcoleptic Well X1 Bad All-nighter X2
(sprinkler)
Ben is nervous: Extremely, Yes, No X4 X5 Ben’s presentation: Good, Not so good, Failed miserably
Appearance: Good, Bad
X3 Memory: Elephant, Good, Bad, None
•
P(Well, Good, Good, No, General Product (Chain)Good) =for BBNs` W) · P(G | W) · P(N | G, G) · P(G | N) Rule P(G) · P(G | n
i 1
P X 1 , X 2 , , X n P X i | parents X i
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 4: Graphical Models of Probability Distributions
• Idea
– Want: model that can be used to perform inference – Desired properties • Correlations among variables • Ability to represent functional, logical, stochastic relationships • Probability of certain events
•
Inference: Decision Support Problems
– Diagnosis (medical, equipment)
– Pattern recognition (image, speech) – Prediction
•
Want to Learn: Most Likely Model that Generates Observed Data
– Under certain assumptions (Causal Markovity), it has been shown that we can do it – Given: data D (tuples or vectors containing observed values of variables) – Return: directed graph (V, E) expressing target CPTs – NEXT: Genetic algorithms
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 5: Genetic Algorithms
• Idea
– Emulate natural process of survival of the fittest (Example: Roaches adapt) – Each generation has many diverse individuals – Each individual competes for the chance to survive – Most common approach: best individuals live to the next generation and mate
– Produce children with traits from both parents – If parents are strong, children might be stronger
•
Major components (operators)
– Fitness function – Chromosome manipulation – Cross-over (Not the “John Edward” type!), mutation
•
From (Educated?) Guess to Gold
– Initial population typically random or not much better than random – bad scores – Performs well with a non-deceptive search space and good genetic operators – Ability to escape local optima with mutations. – Not guaranteed to get the best answer, but usually gets close
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 6: Learning Structure: K2 Algorithm
• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)
FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn} // find best candidate parent // max Dirichlet score WHILE (Parents[xi].Size < Max-Parents) DO Best argmaxj>i (P(D | xj Parents[xi]) RETURN ({Parents[xi] | i {1, 2, …, n}})
IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best
•
A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]
– BBN model for patient monitoring in surgical anesthesia – Vertices (37): findings (e.g., esophageal intubation), intermediates, observables – K2: found BBN different in only 1 edge21 10 22 from gold standard (elicited from expert) 13
19 6 17 28 25 1 18 26 3 7 8 30
Kansas State University Department of Computing and Information Sciences
20 27 11 29 31 32 12
15 34
23 36 35 24
16
5
4
3 7
9
33
14
2
Ben Perry – M.S. thesis defense
Slide 7: Learning Structure: K2 downfalls
• • • • • Greedy (may fall into local maxima) Highly dependent upon node ordering Optimal node ordering must be given If optimal order is already known, an expert could probably create the network Number of orderings consistent with DAGs is exponential (n!)
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 8: Learning Structure: Sparse Candidate
• General Idea:
– Inspect k-best parent candidates at a time. (K2 only inspects one)
– k is typically very small ~ 5 ≤ k ≤ 15
– Exponential to the order of k
•
Algorithm:
Loop until no improvements or iteration limit exceeds: For each node, select the top k parent candidates (mutual information or m_disc) [Restrict] Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase]
•
Must handle cycles.. expensive.
– K2 gives this to us for free – Next: Improving K2
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 9: GASLEAK: A Permutation GA for Variable Ordering
Dtrain (Structure Learning)
D:
Training Data
[2] Representation Evaluator for Bayesian Network Structure Learning Problems
Dval (Inference)
Ie :
Evidence Specification
f(α)
Ordering Fitness
α
Candidate Ordering
[1] Permutation Genetic Algorithm Genetic Algorithm for Structure Learning from Evidence, AIS, and K2 Ben Perry – M.S. thesis defense
ˆ α
Optimized Ordering
Kansas State University Department of Computing and Information Sciences
Slide 10: Properties of the Genetic Algorithm
• • Elitist Chromosome representation
– Integer permutation ordering – Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4
• •
Seeding
– Random shuffle
Operators
– Order crossover – Swap mutation
• •
Fitness
– RMSE
Job farm
– Java-based; Utilize many machines regardless of OS
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 11: GASLEAK results
• Not encouraging
– Bad fitness function or bad evidence b.v. – Many graph errors
0.996 0.982 0.969 0.955 0.941 0.927 0.913 0.899 0.885 0.871 0.858 0.844 0.830 0.816 0.802 0 200 400 600 800 1000 1200 1400
Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables.
Frequency of Validation Set Fitness
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 12: Master’s Thesis: SLAM GA
• • SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm Initial population- tried several approaches:
– Completely Random Bayesian Networks (Box-Muller, Max parents)
– Many illegal structures; wrote fixCycles algorithm.
– Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate
– Performed better than random
– Aggregate of k learned networks from K2 given random orderings (cycles eliminated) – Best approach
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 13: Aggregator Instantiater
Training Data
K2 Manager
D
Random Order
1
K2
BBN
Random Order
K2 2
. . . .
BBN Aggregator BBN Aggregate BBN
Random Order
K2 k
BBN
For small networks, k=1 is best. For larger networks, k=2 is best. Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 14: SLAM GA
• Chromosome representation
– Edge matrix – n^2 bits – Each bit represents a parent edge to node. – 1 = parent, 0 = not parent
•
Operators
– Crossover: Swap parents, fix cycles.
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 15: SLAM GA: Crossover
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 16: SLAM GA
• Chromosome representation
– Edge matrix – n^2 – Each bit represents a parent edge to node. – 1 = parent, 0 = not parent
•
Operators
– Crossover: Swap parents, fix cycles. – Mutation: Reverse, delete, or add a random number of edges. Fix cycles.
•
Fitness
– Total Bayesian Dirichlet equivalence score for each node
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 17: Results - Asia
Best of first network Learned generation Actual
15 Graph Error 1 Errors
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 18: Results – Asia
Best fitness per generation
3750 3700 3650 3600 3550 3500 3450 3400 3350 3300 29 37 41 45 49 57 77 81 85 89 69 13 17 21 25 33 53 61 65 73 93 Generation 97 1 5 9 K2x1 K2x2 Rnd
Ben Perry – M.S. thesis defense
Best Fitness of Generation
Kansas State University Department of Computing and Information Sciences
Slide 19: Results - Poker
Best of first network Learned generation Actual
11Graph Errors 2 Graph Errors
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 20: Results - Poker
Best fitness per generation
2500
2000 Best Fitness of Generation
1500 K2x1 K2x2 Rnd 1000
500
0 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 Generations 97 1 5 9
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 21: Results - Golf
Best of first generation Learned network Actual
11Graph Errors 4 Graph Errors
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 22: Results - Golf
Best fitness per generation
3500
3000 Best Fitness of Generation
2500
2000
K2x1 K2x2 Rnd
1500
1000
500
0 21 53 65 85 Generation 97 33 13 17 25 29 37 41 45 49 57 61 69 73 77 81 89 93 1 5 9
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 23: Results – Boerlage92
Learned network Initial Actual
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 24: Results - Boerlage92
Boerlage92
1600 1400 1200 1000 K2x1 800 600 400 200 0 1 6 K2x2 Rnd
Best Fitness of Generation
16
21
26
31
41
46
51
56
66
71
76
81
86
91
Generation
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
96
11
36
61
Slide 25: Results - Alarm
Best network per generation
8000 7000 6000 5000 K2x1 4000 3000 2000 1000 0 13 21 25 29 33 41 49 53 57 85 89 93 Generation 97 61 65 69 17 37 45 73 77 81 1 5 9 K2x2 Rnd
Ben Perry – M.S. thesis defense
Best Fitness of Generation
Kansas State University Department of Computing and Information Sciences
Slide 26: Final Fitness Values
K2x1 K2x2 Random
Asia 3722.084 3720.6069 3722.249
Poker Golf Boerlage92 1999.395 3081.16 1228.621 2011.54 3220.985 1429.355 2001.884 3214.614 1459.587
Alarm 5006.827 7095.658 6861.285
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 27: K2 vs. SLAM GA
• K2:
– Very good if ordering is known
– Ordering is often not known
– Greedy, very dependent on ordering.
•
SLAM GA
– Stochastic; falls out of local optima trap – Can improve on bad structures learned by K2 – Takes much longer than K2
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 28: GASLEAK vs. SLAM GA
• GASLEAK:
– Gold network never recovered – Much more computationally-expensive
– K2 is run on each [new] individual each generation – Each chromosome must be scored
– Final network has many graph errors
•
SLAM GA
– For small networks, gold standard network often recovered.
– Relatively few graph errors for final network.
– Less computationally intensive
– Initial population most expensive – Each chromosome must be scored
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 29: SLAM GA: Ramifications
• • Effective structure learning algorithm
– Ideal for small networks
Improvement over GASLEAK
– SLAM GA faster in spite of same GA parameters – SLAM GA more accurate
• • •
Improvement over K2 Aggregate algorithm produces better initial population Parent-swapping crossover technique effective
– Diversifies search space while retaining past information
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 30: SLAM GA: Future Work
• • Parameter tweaking Better fitness function
– Several ‘bad’ structures score better than gold standard – GA works fine
• •
‘Intelligent’ mutation operator
– Add edges from pre-qualified set of candidate parents
New instantiation methods
– Use GASLEAK – Other structure-learning algorithms
•
Scalability
– Job farm
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences
Slide 31: Summary
• • • • • Bayesian Network Genetic Algorithms Learning Structure: K2, Sparse Candidate GASLEAK SLAM GA
Ben Perry – M.S. thesis defense
Kansas State University Department of Computing and Information Sciences