spirn's picture
From spirn rss RSS  subscribe Subscribe

Hoard: A Scalable Memory Allocator for Multithreaded Applications 

Hoard: A Scalable Memory Allocator for Multithreaded Applications

 

 
 
Tags:  dedicated  server 
Views:  494
Published:  May 12, 2010
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
Afrihost - South African Dedicated Web Servers

Afrihost - South African Dedicated Web Servers

From: amgreene
Views: 220 Comments: 0
Afrihost - South African Dedicated Web Servers
 
Cccam server

Cccam server

From: anon-461507
Views: 306 Comments: 0
Cccam server
 
Cccam servers

Cccam servers

From: anon-451049
Views: 566 Comments: 0
Cccam servers
 
Afrihost - South African Dedicated Web Servers

Afrihost - South African Dedicated Web Servers

From: bugzyu
Views: 182 Comments: 0

 
See all 
 
More from this user
Measuring Marketing Performance and ROI

Measuring Marketing Performance and ROI

From: spirn
Views: 364
Comments: 0

Lecture 2a charts.ppt

Lecture 2a charts.ppt

From: spirn
Views: 52
Comments: 0

avery denninson2004Annual Report

avery denninson2004AnnualReport

From: spirn
Views: 578
Comments: 0

Providing Global Gateways to success!

Providing Global Gateways to success!

From: spirn
Views: 87
Comments: 0

Shree Gan

Shree Gan

From: spirn
Views: 587
Comments: 0

 
See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: Hoard: A Scalable Memory Allocator for Multithreaded Applications Emery Berger, Kathryn McKinley*, Robert Blumofe, Paul Wilson Department of Computer Sciences * Department of Computer Science Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 2: Motivation Parallel multithreaded programs becoming prevalent web servers, search engines, database managers, etc. run on SMP’s for high performance often embarrassingly parallel Memory allocation is a bottleneck prevents scaling with number of processors Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 3: Assessment Criteria for Multiprocessor Allocators Speed competitive with uniprocessor allocators on one processor Scalability performance linear with the number of processors Fragmentation (= max allocated / max in use) competitive with uniprocessor allocators worst-case and average-case Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 4: Uniprocessor Allocators on Multiprocessors Fragmentation: Excellent Very low for most programs [Wilson & Johnstone] Speed & Scalability: Poor Heap contention a single lock protects the heap Can exacerbate false sharing different processors can share cache lines Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 5: Allocator-Induced False Sharing Allocators cause false sharing! Cache lines can end up spread across a number of processors Practically all allocators do this processor 1 x1 = malloc(s); A cache line processor 2 x2 = malloc(s); thrash… thrash… Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 6: Existing Multiprocessor Allocators Speed: One concurrent heap (e.g., concurrent B-tree): too expensive too many locks/atomic updates O(log n) cost per memory operation ⇒ Fast allocators use multiple heaps Scalability: Allocator-induced false sharing and other bottlenecks Fragmentation: P-fold increase or even unbounded Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 7: Multiprocessor Allocator I: Pure Private Heaps Pure private heaps: one heap per processor. malloc gets memory processor 1 x1= malloc(s) x2= malloc(s) free(x1) x3= malloc(s) processor 2 free(x2) x4= malloc(s) from the processor's heap or the system free puts memory on the processor's heap Avoids heap contention Examples: STL, ad hoc (e.g., Cilk 4.1) = allocated by heap 1 = free, on heap 2 Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 8: How to Break Pure Private Heaps: Fragmentation Pure private heaps: memory consumption can grow without bound! Producer-consumer: processor 1 allocates processor 2 frees processor 1 x1= malloc(s) free(x1) x2= malloc(s) free(x2) x3= malloc(s) free(x3) processor 2 Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 9: Multiprocessor Allocator II: Private Heaps with Ownership Private heaps with ownership: free puts memory back on the originating processor's heap. Avoids unbounded memory consumption Examples: ptmalloc [Gloger], LKmalloc [Larson & Krishnan] processor 1 x1= malloc(s) free(x1) x2= malloc(s) free(x2) processor 2 Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 10: How to Break Private Heaps with Ownership:Fragmentation Private heaps with ownership: memory consumption can blowup by a factor of P. Round-robin producerconsumer: processor i allocates processor i+1 frees processor 1 x1= malloc(s) free(x1) x2= malloc(s) free(x2) x3=malloc(s) free(x3) processor 2 processor 3 This really happens (NDS). Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 11: So What Do We Do Now? Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 12: The Hoard Multiprocessor Memory Allocator Manages memory in page-sized superblocks of samesized objects - Avoids false sharing by not carving up cache lines - Avoids heap contention - local heaps allocate & free small blocks from their set of superblocks Adds a global heap that is a repository of superblocks When the fraction of free memory exceeds the empty fraction, moves superblocks to the global heap - Avoids blowup in memory consumption Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 13: Hoard Example Hoard: one heap per processor + a global heap malloc gets memory from a processor 1 x1= malloc(s) …some mallocs …some frees free(x7) global heap superblock on its heap. free returns memory to its Empty fraction = 1/3 superblock. If the heap is “too empty”, it moves a superblock to the global heap. Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 14: Summary of Analytical Results Worst-case memory consumption: O(n log M/m + P) [instead of O(P n log M/m)] n = memory required M = biggest object size m = smallest object size P = number of processors Best possible: O(n log M/m) [Robson] Provably low synchronization in most cases Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 15: Experiments Run on a dedicated 14-processor Sun Enterprise 300 MHz UltraSparc, 1 GB of RAM Solaris 2.7 All programs compiled with g++ version 2.95.1 Allocators: Hoard version 2.0.2 Solaris (system allocator) Ptmalloc (GNU libc – private heaps with ownership) mtmalloc (Sun’s “MT-hot” allocator) Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 16: Performance: threadtest speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors) Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 17: Performance: Larson Server-style benchmark with sharing Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 18: Performance: false sharing Each thread reads & writes heap data Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 19: Fragmentation Results On most standard uniprocessor benchmarks, Hoard’s fragmentation was low: p2c (Pascal-to-C): 1.20 LRUsim: 1.05 espresso: Ghostscript: 1.47 1.15 Within 20% of Lea’s allocator On the multiprocessor benchmarks and other codes: Fragmentation was between 1.02 and 1.24 for all but one anomalous benchmark (shbench: 3.17). Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 20: Hoard Conclusions Speed: Excellent As fast as a uniprocessor allocator on one processor amortized O(1) cost 1 lock for malloc, 2 for free Scalability: Excellent Scales linearly with the number of processors Avoids false sharing Fragmentation: Very good Worst-case is provably close to ideal Actual observed fragmentation is low Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000
Slide 21: Hoard Heap Details “Segregated size class” allocator Size classes are logarithmicallyspaced Superblocks hold objects of one size class 8 16 24 32 40 48 sizeclass bins radix-sorted superblock lists (emptiest to fullest) empty superblocks are “recycled” Approximately radix-sorted: Allocate from mostly-full superblocks superblocks Fast removal of mostly-empty superblocks Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000 Hoard: A Scalable Memory

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location