Home
Plicks
Plickers
Upload
search
Sign up
|
Log in
From
anon-523605
RSS
CAB Algorithms Presentation
CAB Algorithms Presentation
Tags:
exadata
data mining
datacleansing
Views:
50
Published:
November 08, 2011
0
Download
slides
pdf
Share
Save to favorite
Report Abuse
No related plicks found
No more plicks from this user
URL:
Embed
Thin Player
:
(fits in most blogs)
Embed
Full Player
:
Enter your comments:
Name
Email (will NOT be shown to other users)
Comments: (
watch
)
Notes:
Hide Notes
Slide 1
: Oracle 11g DB <Insert Picture Here> Data Warehousing ETL OLAP Statistics Oracle Data Mining 11g Release 2 Overview and Demo Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation charlie.berger@oracle.com Copyright © 2009 Oracle Corporation
Slide 2
: The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2009 Oracle Corporation
Slide 3
: Outline • Today’s BI must go beyond simple reporting • To succeed, companies must • Eliminate data movement • Collapse information latency • Deliver better BI through analytics • ODM makes the Database an “Analytical Database” • Enables applications “Powered by Oracle Data Mining” • Brief demonstrations 1. Oracle Data Mining 2. OBI EE Dashboards with ODM Results 3. Oracle Sales Prospector with embedded ODM Copyright © 2009 Oracle Corporation
Slide 4
: Analytics: Strategic and Mission Critical • Competing on Analytics, by Tom Davenport • “Some companies have built their very businesses on their ability to collect, analyze, and act on data.” • “Although numerous organizations are embracing analytics, only a handful have achieved this level of proficiency. But analytics competitors are the leaders in their varied fields—consumer products finance, retail, and travel and entertainment among them.” • “Organizations are moving beyond query and reporting” - IDC 2006 • Super Crunchers, by Ian Ayers • “In the past, one could get by on intuition and experience. Times have changed. Today, the name of the game is data.” —Steven D. Levitt, author of Freakonomics • “Data-mining and statistical analysis have suddenly become cool.... Dissecting marketing, politics, and even sports, stuff this complex and important shouldn't be this much fun to read.” —Wired Copyright © 2009 Oracle Corporation
Slide 5
: Competitive Advantage Optimization Predictive Modeling What’s the best that can happen? What will happen next? $$ Competitive Advantage Forecasting/Extrapolation Statistical Analysis Alerts Query/drill down Ad hoc reports Standard Reports Analytic$ What if these trends continue? Why is this happening? What actions are needed? Where exactly is the problem? How many, how often, where? What happened? Access & Reporting Degree of Intelligence Source: Competing on Analytics, by T. Davenport & J. Harris Copyright © 2009 Oracle Corporation
Slide 6
: Oracle Data Mining Option Copyright © 2009 Oracle Corporation
Slide 7
: What is Data Mining? • Automatically sifts through data to find hidden patterns, discover new insights, and make predictions • Data Mining can provide valuable results: • • • • Predict customer behavior (Classification) Predict or estimate a value (Regression) Segment a population (Clustering) Identify factors more associated with a business problem (Attribute Importance) • Find profiles of targeted people or items (Decision Trees) • Determine important relationships and “market baskets” within the population (Associations) • Find fraudulent or “rare events” (Anomaly Detection) Copyright © 2009 Oracle Corporation
Slide 8
: Oracle Data Mining Example Use Cases • Retail · Customer segmentation · Response modeling · Recommend next likely product · Profile high value customers • Banking · Credit scoring · Probability of default · Customer profitability · Customer targeting • Insurance · Risk factor identification · Claims fraud · Policy bundling · Employee retention • Higher Education · Alumni donations · Student acquisition · Student retention · At-risk student identification • Healthcare · Patient procedure recommendation · Patient outcome prediction · Fraud detection · Doctor & nurse note analysis • Life Sciences · Drug discovery & interaction · Common factors in (un)healthy patients · Cancer cell classification · Drug safety surveillance • Telecommunications · Customer churn · Identify cross-sell opportunities · Network intrusion detection • Public Sector · Taxation fraud & anomalies · Crime analysis · Pattern recognition in military surveillance • Manufacturing · Root cause analysis of defects · Warranty analysis · Reliability analysis · Yield analysis • Automotive · Feature bundling for customer segments · Supplier quality analysis · Problem diagnosis • Chemical · New compound discovery · Molecule clustering · Product yield analysis • Utilities · Predict power line / equipment failure · Product bundling · Consumer fraud detection Copyright © 2009 Oracle Corporation
Slide 9
: Data Mining Provides Better Information, Valuable Insights and Predictions Cell Phone Churners vs. Loyal Customers Segment #3: IF CUST_MO > 7 AND INCOME < $175K, THEN Prediction = Cell Phone Churner, Confidence = 83%, Support = 6/39 Insight & Prediction Income Segment #1: IF CUST_MO > 14 AND INCOME < $90K, THEN Prediction = Cell Phone Churner, Confidence = 100%, Support = 8/39 Customer Months Source: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff Copyright © 2009 Oracle Corporation
Slide 10
: Predicting High LTV Customers Using a Decision Tree Model Simple model: Other ODM models can mine: Mortgage_Amount >$500K House_Own 1 House 2 or More Homes Age <42 > 42 >35 Years_Cust >2 <2 <$500K Age <=35 • unstructured data (e.g. text comments) • transactions data (e.g. purchases), etc. Salary <$80K >$80K LTV = HIGH LTV = Very_High LTV = High LTV= Low LTV = Low LTV = Medium IF (Mortgage_Amount > $500K AND House_Own = 2 or more AND Age = >42) THEN Probability(Lifetime Customer Value is “VERY HIGH” = 77%, Support = 15% Copyright © 2009 Oracle Corporation
Slide 11
: “Essentially, all models are wrong, but some are useful.” - George Box (one of the most influential statisticians of the 20th century and a pioneer in the areas of quality control, time series analysis, design of experiments and Bayesian inference.) Copyright © 2009 Oracle Corporation
Slide 12
: Oracle Data Mining Overview (Classification) Input Attributes Historic Data Name Jones Smith Lee Rogers Income 30,000 55,000 25,000 50,000 Age . . . . . . . 30 67 23 44 52 73 32 34 Target Respond? 1 =Yes, 0 =No 1 1 0 0 Model Functional Relationship: Y = F(X1, X2, …, Xm) Cases New Data Campos 40,500 Horn 37,000 Habers 57,200 Berger 95,600 ? ? ? ? 1 0 0 1 .85 .74 .93 .65 Prediction Confidence Copyright © 2009 Oracle Corporation
Slide 13
: Oracle Data Mining Algorithm Summary 11g Problem Classification Algorithm Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine Multiple Regression (GLM) Support Vector Machine One Class SVM Minimum Description Length (MDL) A1 A2 A3 A4 A5 A6 A7 Applicability Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Classical statistical technique Wide / narrow data / text Lack examples Attribute reduction Identify useful data Reduce data noise Market basket analysis Link analysis Product grouping Text mining Gene and protein analysis Text analysis Feature reduction Regression Anomaly Detection Attribute Importance Association Rules Clustering Apriori Hierarchical K-Means Hierarchical O-Cluster Feature Extraction F1 F2 F3 F4 NMF Copyright © 2009 Oracle Corporation
Slide 14
: Traditional Analytics (SAS) Environment (Oracle, DB2, SQL Server, TeraData, Ext. Tables, etc.) Source Data (SAS Datasets) SAS Work Area SAS Processing (Statistical functions/ Data mining) (SAS Work Area) Process Output (e.g. Oracle) Target SAS SAS SAS • SAS environment requires: • Data movement • Data duplication • Loss of security Copyright © 2009 Oracle Corporation
Slide 15
: Oracle Architecture (Oracle, DB2, SQL Server, TeraData, Ext. Tables, etc.) Source Data • Oracle environment: • Eliminates data movement • Eliminates data duplication • Preserves security Copyright © 2009 Oracle Corporation
Slide 16
: In-Database Data Mining Traditional Analytics Data Import Data Mining Model “Scoring” Data Preparation and Transformation Data Mining Model Building Data Prep & Transformation Oracle Data Mining Results • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security Model “Scoring” Data remains in the Database Embedded data preparation Cutting edge machine learning algorithms inside the SQL kernel of Database SQL—Most powerful language for data preparation and transformation Data remains in the Database Savings Data Extraction Hours, Days or Weeks Source Data SAS Work Area SAS Proces sing Proces s Output Target Model “Scoring” Embedded Data Prep Model Building Data Preparation Secs, Mins or Hours SAS SAS SAS Copyright © 2009 Oracle Corporation
Slide 17
: In-Database Data Mining Advantages • ODM architecture provides greater • Performance, scalability, and data security OLAP Oracle 11g DB Data Warehousing ETL Statistics • Data remains in the database • Fewer moving parts; shorter information latency Data Mining • Straightforward inclusion within interesting and arbitrarily complex queries • “SELECT Customers WHERE Income > 100K, AND Probability(Buy Product A) > .85;” • Real-world scalability—available for mission critical appls • Enables pipelining of results without costly materialization • Performant and scalable: • Fast scoring: 2.5 million records scored in 6 seconds on a single CPU system • Real-time scoring: 100 models on a single CPU: 0.085 seconds Copyright © 2009 Oracle Corporation
Slide 18
: HP Oracle Database Machine & ODM • Integrated data warehouse solution • Extreme Performance • 10-100X faster than conventional DW systems • Scalability to Petabytes • Enterprise-Ready • Complete data warehouse functionality • Enterprise-level availability and security • Scoring of Oracle Data Mining models • Blazingly fast performance • For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod, ‘Y’ using *) > 0.8; Copyright © 2009 Oracle Corporation
Slide 19
: HP Oracle Database Machine & ODM • In 11gR2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Copyright © 2009 Oracle Corporation Company Confidential June 2009
Slide 20
: ODM 11gR2 Scoring: Offloaded to Exadata • Data mining scoring executed in Exadata: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod, ‘Y’ using *) > 0.8; Scoring function executed in Exadata • All scoring functions offloaded to Exadata • Benefits • Reduces data returned from Exadata to Database server • Reduces CPU utilization on Database Server • Up to 10x performance gains Copyright © 2009 Oracle Corporation Company Confidential June 2009
Slide 21
: “If I had one hour to save the world, I would spend fifty-five minutes defining the problem and only five minutes finding the solution” - Albert Einstein (see also http://www.wikihow.com/Define-a-Problem) Copyright © 2009 Oracle Corporation
Slide 22
: Where to Start? “Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data needed to populate the investigation and models.” - James Taylor with Neil Raden, authors, Smart (Enough) Systems Copyright © 2009 Oracle Corporation
Slide 23
: Oracle Data Mining and Unstructured Data • Oracle Data Mining mines unstructured i.e. “text” data • Include free text and comments in ODM models • Cluster and Classify documents • Oracle Text used to preprocess unstructured text Copyright © 2009 Oracle Corporation
Slide 24
: Example: Simple, Predictive SQL • Select customers who are more than 85% likely to be HIGH VALUE customers & display their AGE & MORTGAGE_AMOUNT SELECT * from( SELECT A.CUSTOMER_ID, A.AGE, MORTGAGE_AMOUNT,PREDICTION_PROBABILITY (INSUR_CUST_LT13126_DT, 'VERY HIGH' USING A.*) prob FROM CBERGER.INSUR_CUST_LTV A) WHERE prob > 0.85; Copyright © 2009 Oracle Corporation
Slide 25
: Fraud Prediction Demo drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', 'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET'); end; / -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4'))) where rnk <= 5 order by percent_fraud desc; POLICYNUMBER -----------6532 2749 3440 654 12650 PERCENT_FRAUD ------------64.78 64.17 63.22 63.1 62.36 RNK ---------1 2 3 4 5 Copyright © 2009 Oracle Corporation
Slide 26
: Oracle Data Mining 11g • Data Mining Functions (Server) • PL/SQL & Java APIs • Develop & deploy predictive analytics applications Oracle 11g DB 11g Data Warehousing ETL OLAP Statistics • Wide range of DM algorithms (12) • • • • • • • Classification & regression Clustering Anomaly detection Attribute importance Feature extraction (NMF) Association rules (Market Basket analysis) Structured & unstructured data (text mining) Data Mining • Oracle Data Miner (GUI) • Simplified, guided data mining using wizards • Predictive Analytics • “1-click data mining” from a spreadsheet Copyright © 2009 Oracle Corporation
Slide 27
: Analytical Database Changes*Everything* It boils down to this: Less data movement = faster analytics, and faster analytics = better BI throughout the enterprise OLAP Predictive Analytics Statistical Functions Data Mining ?x Text Mining Copyright © 2009 Oracle Corporation
Slide 28
: Integration with Oracle BI EE Oracle Data Mining results available to Oracle BI EE administrators Oracle BI EE defines results for end user presentation Copyright © 2009 Oracle Corporation
Slide 29
: Example Better Information for OBI EE Reports and Dashboards ODM’s predictions & Predictions & probabilities are available in available in the Database Database for for reporting Oracle BI EE using Oracle and other BI EE and reporting tools other tools Copyright © 2009 Oracle Corporation
Slide 30
: Oracle SQL Statistical Functions (Free in Every Oracle Database) Copyright © 2009 Oracle Corporation
Slide 31
: 11g Statistics & SQL Analytics • Ranking functions • rank, dense_rank, cume_dist, percent_rank, ntile Statistics Descriptive Statistics • DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values • Window Aggregate functions (moving and cumulative) • Avg, sum, min, max, count, variance, stddev, first_value, last_value • Correlations • Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric). • LAG/LEAD functions • Direct inter-row reference using offsets • Reporting Aggregate functions • Sum, avg, min, max, variance, stddev, count, ratio_to_report • Cross Tabs • Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa • Statistical Aggregates • Correlation, linear regression family, covariance • Hypothesis Testing • Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA • Linear regression • Fitting of an ordinary-least-squares regression line to a set of number pairs. • Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions • Distribution Fitting • Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition Copyright © 2009 Oracle Corporation
Slide 32
: Descriptive Statistics • MEDIAN & MODE • > SQL • Median: takes numeric or datetype values and returns the middle value Mode: returns the most common value A. SELECT STATS_MODE(AGE) from LYMPHOMA; B. SELECT MEDIAN(AGE) from LYMPHOMA; C. SELECT TREATMENT_PLAN, STATS_MODE(LYMPH_TYPE) from lymphoma GROUP BY TREATMENT_PLAN; D. SELECT LYMPH_TYPE, MEDIAN(SIZE_REDUCTION) from LYMPHOMA GROUP BY LYMPH_TYPE ORDER BY MEDIAN(SIZE_REDUCTION) ASC; Copyright © 2009 Oracle Corporation
Slide 33
: Split Lot A/B Offer testing • Offer “A” to one population and “B” to another • Over time period “t” calculate median purchase amounts of customers receiving offer A & B • Perform t-test to compare • If statistically significantly better results achieved from one offer over another, offer everyone higher performing offer Copyright © 2009 Oracle Corporation
Slide 34
: Independent Samples T-Test (Pooled Variances) • Query compares the mean of AMOUNT_SOLD between MEN and WOMEN within CUST_INCOME_LEVEL ranges SELECT substr(cust_income_level,1,22) income_level, avg(decode(cust_gender,'M',amount_sold,null)) sold_to_men, avg(decode(cust_gender,'F',amount_sold,null)) sold_to_women, stats_t_test_indep(cust_gender, amount_sold, 'STATISTIC','F') t_observed, stats_t_test_indep(cust_gender, amount_sold) two_sided_p_value FROM sh.customers c, sh.sales s WHERE c.cust_id=s.cust_id GROUP BY rollup(cust_income_level) ORDER BY 1; SQL Worksheet Copyright © 2009 Oracle Corporation
Slide 35
: Correlation Functions ?x • The CORR_S and CORR_K select CORR_S(AGE, WEIGHT) functions support nonparametric or coefficient, rank correlation (finding correlations CORR_S(AGE, WEIGHT, between expressions that are ordinal 'TWO_SIDED_SIG') scaled). p_value, • Correlation coefficients take on a substr(TREATMENT_PLAN, 1,15) value ranging from –1 to 1, where: • 1 indicates a perfect relationship • –1 indicates a perfect inverse relationship • 0 indicates no relationship as TREATMENT_PLAN from CBERGER.LYMPHOMA GROUP BY TREATMENT_PLAN; • The following query determines whether there is a correlation between the AGE and WEIGHT of people, using Spearman's correlation: Copyright © 2009 Oracle Corporation
Slide 36
: Analytics vs. SAS 1. In-Database Analytics Engine Basic Statistics (Free) Data Mining Text Mining 2. Costs (ODM: $23K cpu) Simplified environment Single server Security 3. IT Platform SQL (standard) Java (standard) Oracle 11g DB Data Warehousing ETL OLAP Statistics Data Mining 1. External Analytical Engine Basic Statistics Data Mining Text Mining (separate: SAS EM for Text) Advanced Statistics 2. Costs (SAS EM: $150K/5 users) Duplicates data Annual Renewal Fee (AUF) (~45% each year) 3. IT Platform SAS Code (proprietary) SAS Copyright © 2009 Oracle Corporation
Slide 37
: Analytics vs. SAS 1. In-Database Analytics Engine Basic Statistics (Free) Data Mining Text Mining 2. Costs (ODM: $23K cpu) Simplified environment Single server Security 3. IT Platform SQL (standard) Java (standard) Oracle 11g DB Data Warehousing ETL OLAP Statistics Data Mining 1. External Analytical Engine Basic Statistics Data Mining Text Mining (separate: SAS EM for Text) Advanced Statistics 2. Costs (SAS EM: $150K/5 users) Duplicates data Annual Renewal Fee (AUF) (~45% each year) 3. IT Platform SAS Code (proprietary) Oracle 11g DB Data Warehousing ETL OLAP Statistics Data Mining SAS Copyright © 2009 Oracle Corporation
Slide 38
: SAS In-Database Processing 3-Year Road Map •“The goal of the SAS In-Database initiative is … to achieve deeper technical integration with database providers.. •…, the SAS engine often must load and extract data over a network to and from the DBMS. This presents a series of challenges: • …Network bottlenecks between SAS and the DBMS constrain access to large volumes of data •… the results of the SAS processing must be transferred back to the DBMS for final storage, which further increases the cost. Source: SAS In-Database Processing White Paper—October 2007 Copyright © 2009 Oracle Corporation
Slide 39
: IDC Worldwide Business Analytics Software Oracle http://www.oracle.com/corporate/analyst/reports/infrastructure/bi_dw/208699e.pdf Copyright © 2009 Oracle Corporation
Slide 40
: Brief Demonstrations 1. Oracle Data Mining 2. Oracle Business Intelligence EE 3. CRM Sales Prospector Copyright © 2009 Oracle Corporation
Slide 41
: Oracle Data Mining + OBI EE Copyright © 2009 Oracle Corporation
Slide 42
: Quick Demo: Oracle Data Mining • • Scenario: Insurance Company Business problem(s): 1. Better understand the business by looking at graphs of the data 2. Identify the factors (attributes) most associated with Customer who BUY_INSURANCE 3. Target Best Customers a. Build a predictive model to understand who will be a VERY_HIGH VALUE Customer …. And WHY (IF… THEN.. Rules that can describe them) b. Predict who is likely to be a VERY_HIGH VALUE Customer in the future c. View results in an OBI EE Dashboard • • Including other business problems e.g. Fraud, Cross-Sell, etc. (Entire process can be automated w/ PL/SQL and/or Java APIs) Copyright © 2009 Oracle Corporation
Slide 43
: Oracle Data Mining + OBI EE Understand the Data Oracle Data Mining helps to visualize the data Copyright © 2009 Oracle Corporation
Slide 44
: Oracle Data Mining + OBI EE Target the Right Customers Oracle Data Miner guides the analyst through the data mining process Copyright © 2009 Oracle Corporation
Slide 45
: Oracle Data Mining + OBI EE Targeting High Value Customers Oracle Data Mining builds a model that differentiates HI_VALUE_CUSTOMERS from others Copyright © 2009 Oracle Corporation
Slide 46
: Oracle Data Mining + OBI EE Targeting High Value Customers Oracle Data Mining creates a prioritized list of customer who are likely to be high value Copyright © 2009 Oracle Corporation
Slide 47
: Integration with Oracle BI EE Oracle Data Mining provides more information and better insight Copyright © 2009 Oracle Corporation
Slide 48
: Oracle Data Mining Know More, Do More, Spend Less • Business Decision Makers • Make Better Decisions • Extract More Value from Your Data • Lower Your Total Cost of Ownership • Data Analysts • Get Results Faster • Get More Results • Easy to Use • Integrators and IT • Create More Value for Your Organization • Make Your Work Easier • Transform IT from a Cost to a Profit Center Copyright © 2009 Oracle Corporation
Slide 49
: Oracle Data Mining (SQL & Java) APIs Copyright © 2009 Oracle Corporation
Slide 50
: HCM Prediction Demo drop table HCM_SET; exec dbms_data_mining.drop_model('HCMMODEL'); create table HCM_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into HCM_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into HCM_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('HCMMODEL', 'CLASSIFICATION', 'EMPL_DATA', 'EMPL_ID', 'CURR_EMPL', 'HCM_SET'); end; / -- accuracy (per-class and overall) col actual format a6 select actual, round(corr*100/total,2) percent, corr, total-corr incorr, total from (select actual, sum(decode(actual,predicted,1,0)) corr, count(*) total from (select CURR_EMPL actual, prediction(HCMMODEL using *) predicted from EMPL_DATA_JUNE07) group by rollup(actual)); -- top 5 very high value, current employees most likely to leave select * from (select empl_id, round(prob_leave*100,2) percent_leave, rank() over (order by prob_leave desc) rnk from (select empl_id, prediction_probability(HCMMODEL, 'NO' using *) prob_leave from EMPL_DATA_JUNE07 where CURR_EMPL = 'YES' and LTV_BIN = 'VERY HIGH')) where rnk <= 5 order by percent_leave desc; ACTUAL PERCENT CORR ------------ ---------NO YES 84.04 80.61 81.53 Elapsed: 00:00:01.51 SQL> ---------3133 8159 11292 INCORR ---------595 1963 2558 TOTAL ---------3728 10122 13850 EMPL_ID ---------772858 775441 777992 773473 771813 SQL> PERCENT_LEAVE ------------96.84 95.65 92.1 91.51 90.21 RNK ---------1 2 3 4 5 Elapsed: 00:00:00.29 Copyright © 2009 Oracle Corporation
Slide 51
: Predictive Analytics Use Case • The cast: • Peter: a data mining analyst • Sally: a marketing manager • Peter builds a decision tree classification model, tree_model • Peter grants the ability to view/score the tree model to Sally GRANT SELECT MODEL ON tree_model TO Sally; • Sally inspects the model, likes it, and wants it deployed • Sally scores the customer database using the new model and his understanding of the cost of contacting a customer and sends the new contact list to the head of the sales department CREATE TABLE AS SELECT cust_name, cust_phone FROM customers WHERE prediction(Peter.tree_model cost matrix (0,5,1,0) using *) = ‘responder’; Copyright © 2009 Oracle Corporation
Slide 52
: Real-time Prediction with records as (select On-the-fly, single record 78000 SALARY, 250000 MORTGAGE_AMOUNT, apply with new data (e.g. 6 TIME_AS_CUSTOMER, from call center) 12 MONTHLY_CHECKS_WRITTEN, 55 AGE, 423 BANK_FUNDS, 'Married' MARITAL_STATUS, 'Nurse' PROFESSION, 'M' SEX, 4000 CREDIT_CARD_LIMITS, 2 N_OF_DEPENDENTS, 1 HOUSE_OWNERSHIP from dual) select s.prediction prediction, s.probability probability from ( select PREDICTION_SET(INSUR_CUST_LT68054_DT, 1 USING *) pset from records) t, TABLE(t.pset) s; Copyright © 2009 Oracle Corporation
Slide 53
: Prediction Multiple Models/Optimization with records as (select 178255 ANNUAL_INCOME, 30 AGE, 'Bach.' EDUCATION, 'Married' MARITAL_STATUS, 'Male' SEX, 70 HOURS_PER_WEEK, 98 PAYROLL_DEDUCTION from dual) select t.* from ( select 'CAR_MODEL' MODEL, s1.prediction prediction, s1.probability probability, s1.probability*25000 as expected_revenue from ( select PREDICTION_SET(NBMODEL_JDM, 1 USING *) pset from records ) t1, TABLE(t1.pset) s1 UNION select 'MOTOCYCLE_MODEL' MODEL, s2.prediction prediction, s2.probability probability, s1.probability*2000 as expected_revenue from ( select PREDICTION_SET(ABNMODEL_JDM, 1 USING *) pset from records ) t2, TABLE(t2.pset) s2 UNION select 'TRICYCLE_MODEL' MODEL, s3.prediction prediction, s3.probability probability, s1.probability*50 as expected_revenue from ( select PREDICTION_SET(TREEMODEL_JDM, 1 USING *) pset from records ) t3, TABLE(t3.pset) s3 UNION select 'BICYCLE_MODEL' MODEL, s4.prediction prediction, s4.probability probability, s1.probability*200 as expected_revenue from ( select PREDICTION_SET(SVMCMODEL_JDM, 1 USING *) pset from records ) t4, TABLE(t4.pset) s4 )t On-the-fly, multiple models; then sort by expected revenues order by t.expected_revenue desc; Copyright © 2009 Oracle Corporation
Slide 54
: Oracle Sales Prospector Copyright © 2009 Oracle Corporation
Slide 55
: Larry Ellison Oracle Open World Keynote November 2007 • Announces Fusion Edge CRM On-Demand Hosted Application with integrated data mining to mine customer database Oracle Data Mining Copyright © 2009 Oracle Corporation
Slide 56
: How Can I Sell More? • Which prospects most resemble those customers? • Which types of customers are buying which products? Products Customers • Which references can I use to help me close my deals? References Sales Rep Copyright © 2009 Oracle Corporation
Slide 57
: Oracle Data Mining = the Science of Selling Oracle Sales Prospector ODM Predictions exposed via Social CRM Dashboards Oracle Database 11G Social CRM schema ships with Oracle Database EE 11g + Data Mining Option Copyright © 2009 Oracle Corporation
Slide 58
: Oracle Data Mining predicts likelihood of purchases Oracle Data Mining recommends products customer is likely to buy Oracle Data Mining suggests likely references Copyright © 2009 Oracle Corporation
Slide 59
: Oracle Retail Data Model Copyright © 2009 Oracle Corporation
Slide 60
: Oracle Retail Data Model Out-of-the box, Oracle Data Mining generates profiles of customers Oracle Data Mining automatically mines data for analysis reports Copyright © 2009 Oracle Corporation
Slide 61
: Summary Copyright © 2009 Oracle Corporation
Slide 62
: Oracle Data Mining Summary • Powers Next-Generation Predictive Applications • • • • Rapidly Build Applications that Automatically Mine Data Code Once, Run Anywhere Parallel and Distributed Processing Industry Standard SQL and Java APIs • Industry Leader in In-Database Data Mining • Option to the Industry Leading RDBMS—Oracle Database • Classification, Regression, Attribute Importance • Clustering, Market Basket Analysis, Anomaly Detection, Feature Extraction • Cutting Edge Algorithms: SVM, One-Class SVM, NMF, Scalable GLM Copyright © 2009 Oracle Corporation
Slide 63
: Oracle Data Mining Summary • More Information from More Data • Easy to use Oracle Data Miner Graphical User Interface • Wide Range of In-Database Data Mining Algorithms and Statistics • Mine Text, Transactional, and Star Schema Data • Mine XML, Semantic RDF, Spatial, and OLAP Data • Eliminate Barriers Between Analysts and IT • Quickly Disseminate Analytical Results and Models Throughout the Organization • Include Real-Time Predictive Models and New Insights in SQL queries • Eliminate Data Movement, Maximize Security Copyright © 2009 Oracle Corporation
Slide 64
: Getting Started Copyright © 2009 Oracle Corporation
Slide 65
: Data Mining Projects • “The vast majority of BI professionals are excited about the prospects of data mining, but are fully mystified about where to begin or even how to prepare” • “Of those who did initiate a modeling initiative, …51% of data mining projects either never left the ground, did not realize value or the ultimate results were not measurable” • “In most cases, those who attempted an implementation ended up building excellent predictive models that answer the wrong questions” • “For any organization with annual revenues more than $50 million, employing data mining technology is not a matter of whether, but when” http://www.the-modeling-agency.com Copyright © 2009 Oracle Corporation
Slide 66
: Getting Started with Oracle Data Mining • You can download a free evaluation copy of Oracle Data Mining and try it out on your own computer. See the Oracle Data Mining Administrators Guide, which tells how to install a database and set up a user account. Download the Oracle Database Enterprise Edition (10gR2 or 11g) from the Oracle Technology Network. The Oracle Data Mining Option is installed by default with Oracle Database EE. For data analysts or those new to data mining, you will also want to download and install Oracle Data Miner, the free, optional graphical user interface. A summary of algorithms supported by ODM with links to the documentation is posted here. • To get started quickly, Part I of ODM Concepts introduces you to the features and terminology of Oracle Data Mining. Then, use the Oracle Data Mining Tutorial to provide step-by-step guidance for using the Oracle Data Miner graphical interface. … You can use the Oracle Data Miner (Data --> Import...) to import your own data in .csv text files and begin mining. • For application developers, the ODM Application Developer's Guide along with the Oracle Data Mining sample programs gets you started writing SQL- or Java-based data mining applications. • Some additional datasets for learning Oracle Data Mining include: CUST_INSUR_LTV (dmp file), CD_BUYERS (dmp file), EMPL_DATA (dmp file), LYMPHOMA (dmp file) • Application developers can integrate predictive analytics into any report or enterprise application using ODM's server-based PL/SQL or Java APIs. See ODM Sample Programs for demo sample code. • Oracle Data Mining Education through Oracle University • • Installing Data Miner (Oracle By Example) Solving Business Problems with Data Mining (Oracle By Example) http://www.oracle.com/technology/products/bi/odm/odm_education.html Copyright © 2009 Oracle Corporation
Slide 67
: More Information: Oracle Data Mining 11g • oracle.com/technology/products/bi/odm/index.html <Insert Picture Here> Oracle Statistical Functions • http://www.oracle.com/technology/products/bi/stats_fns/index.html Oracle Business Intelligence Solutions • oracle.com/bi http://search.oracle.com oracle data mining Contact Information: Email: Charlie.berger@oracle.com Copyright © 2009 Oracle Corporation
Slide 68
: QUESTIONS ANSWERS
Slide 69
: “This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Time on Slide
Time on Plick
Slides per Visit
Slide Views
Views by Location