ckdu's picture
From ckdu rss RSS  subscribe Subscribe

From Content Storage to Scaling Smart Data 

 

 
 
Tags:  false cloud  hbase  outerthought  berlinbuzzwords 
Views:  181
Published:  January 10, 2012
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
Master key-system

Master key-system

From: olafur
Views: 465 Comments: 0

 
False Flag Cyber Attack all over world

False Flag Cyber Attack all over world

From: kingkong
Views: 584 Comments: 0
False Flag Cyber Attack is the most current and hottest topic people are searching for everyone want to see the deatils about false flag cyber attack there are so many people who aware of cyber attack but false flag cyber attack this is new so what (more)

 
False Eyelashes | Fake Eyelashes

False Eyelashes | Fake Eyelashes

From: aria1015
Views: 527 Comments: 0
False Eyelashes and Fake Eyelashes wholesale information
 
See all 
 
More from this user
Direct Navigation - Nora Nanayakkara - Sedo

Direct Navigation - Nora Nanayakkara - Sedo

From: ckdu
Views: 787
Comments: 0

Recruiting Presentation 1

Recruiting Presentation 1

From: ckdu
Views: 244
Comments: 0

Spyware Remover Discount

Spyware Remover Discount

From: ckdu
Views: 826
Comments: 0

J N T U  Kakinada  B

J N T U Kakinada B

From: ckdu
Views: 175
Comments: 0

Stephen King   Carrie

Stephen King Carrie

From: ckdu
Views: 3111
Comments: 1

Microsoft 2007 Annual Report

Microsoft 2007 Annual Report

From: ckdu
Views: 1309
Comments: 0

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: Lily Smart data, at scale madE easy from content storage to scaling smart data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011
Slide 2: IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 2
Slide 3: the pain data need for distributed processing moore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 3
Slide 4: the pain » growth of data sets » smart businesses need to apply analytics to activities » doing business online means real-time » talent shortage Smart data, at scale madE easy IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 4
Slide 5: LILY The Real-time Platform built for the Age of Data. We manage, track and measure your data and users, and do the mat(c)hmaking in-between: » provide you with business intelligence and analytics » harvest user profiles and learn their interests » dynamically engage your users using quality recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 5
Slide 6: where would you use lily? » large collections of data » content repositories » library catalogs » (media) asset management » product catalogs » ‘live’ archives » large groups of users » e-commerce / retail » news / media » ... if you want to use big data, but you need easy. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 6
Slide 7: + IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 thi si sw he re t he ma g ic h ap pe ns 7
Slide 8: beyond content management broadcast marketing revenue product / service IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 8
Slide 9: beyond content management: data + analytics call to action recommendations personalised revenue product / service audience data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 9
Slide 10: LILY 2.0: smart data SMARTER DATA s relation data processing recommendations semantic augmentation Analytics usage metrics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 10
Slide 11: roadmap » now: highly-scalable data repository: store, index and search » next: with real-time usage stats gathering and analytics » later: and built-in context- and user-sensitive recommendations » built on top of Google BigTable / HBase / Solr » identical, robust technology in use at Facebook, Twitter, StumbleUpon, Yahoo! » scales widely over distributed (cloud) infrastructure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 11
Slide 12: Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 12
Slide 13: Sample Lily Schema (excerpt) namespaces:
{ 



/*
Declaration
of
namespace
prefixes.
*/ 



"org.lilyproject.bookssample":
"b", 



"org.lilyproject.vtag":
"vtag" 

}, fieldTypes:
[ 

{ 



name:
"b$title", 



valueType:
{
primitive:
"STRING"
}, 



scope:
"versioned" 

}, 

{ 



name:
"b$pages", 



valueType:
{
primitive:
"INTEGER"
}, 



scope:
"versioned" 

}, 

{ 



name:
"b$language", 



valueType:
{
primitive:
"STRING"
}, 



scope:
"versioned" 

}, 

{ 



name:
"b$authors", 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 



scope:
"versioned" 

}, 

{ 



name:
"b$name", 



valueType:
{
primitive:
"STRING"
}, 



scope:
"versioned" 

}, 

{ 



name:
"b$bio", 



valueType:
{
primitive:
"STRING"
}, 



scope:
"versioned" 

}, 

{ 



name:
"vtag$last", 



valueType:
{
primitive:
"LONG"
}, 



scope:
"non_versioned" 

} 

], recordTypes:
[ 

{ 



name:
"b$Book", 



fields:
[ 





{name:
"b$title",
mandatory:
true
}, 





{name:
"b$pages",
mandatory:
false
}, 





{name:
"b$language",
mandatory:
false
}, 





{name:
"b$authors",
mandatory:
false
}, 





{name:
"vtag$last",
mandatory:
false
} 



] 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 13
Slide 14: Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 14
Slide 15: Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 15
Slide 16: HBase indexing & RowLog Library » building and querying » need for sync/async indexes, GAE-style rowkey col val3 val2 col foo6 foo7 operations » updating of secondary indexes content table A B (e.g. link tables) » feeding of Indexer (= indexes Lily-content into Solr) index table A order rowkey val2-B val3-A col » not: transactions » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 16
Slide 17: The Lily Indexer denormalization indexing of multiple versions of a record incremental index updating batch index building blob content extraction sharding towards multiple SOLR instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 17
Slide 18: status june 2011 » Lily 1.0.1 released - developing since Q4/09 » some customers - DIY retail / media / news » e-commerce platform project » Lily as the data (integration) tier » first contrib: FrogPond (annotated Java <> Lily mapper) https://bitbucket.org/calmera/frogpond IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 18
Slide 19: Next up: usage stats » sits in CRUD-path » tracks users ops against records » from both perspectives » arbitrary K/V properties: time, record rec interactions user location, ... » automatically builds user om me nd o ati » tied to records ops » indexed access » time dimension: trending IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 tim profiles (as records) e ns indexes 19
Slide 20: from usage stats to recommendations ‘light’ record user » grouping of users based on » shared properties » shared record access » grouping of records based on » shared properties » shared user operations { connections recommendations 20 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011
Slide 21: full-on recommendations » look at real-time-capable Mahout algorithms » pre-index or -calculate as much as possible » save as secondary indexes » present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to record processing pipeline » pattern detection, keywords, ontologies, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 21
Slide 22: timeline » Lily + usage stats » Lily + usage stats + light-weight analytics » Lily + recommendations ‘light’ » Lily 2.0 : full-on recommendations 10/2011 12/2011 3/2012 6/2012 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 22
Slide 23: lily enterprise » adds tools: » yum/deb package repo » cluster deploy scripts (also EC2) » Admin UI » + enterprise support IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 23
Slide 24: demo (if time permits) message ‣to ‣from ‣parts ‣listId ‣subject ‣sender part ‣content ‣mediaType ‣message IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 24
Slide 25: WHERE? www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011 25
Slide 26: Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location