From:
pnisbet
Views: 20
Comments: 0
P2P file sharing music is rife, but what is it and is it legal? Here we consider the legal aspects of file sharing music and movies, and where it it is possible to legally download music free on fixed and mobile devices without breaching copyright.
From:
pnisbet
Views: 31
Comments: 0
P2P file sharing music is rife, but what is it and is it legal? Here we consider the legal aspects of file sharing music and movies, and where it it is possible to legally download music free on fixed and mobile devices without breaching copyright.
Slide 1: CoBlitz: A Scalable Large-file Transfer Service (COS 461)
KyoungSoo Park Princeton University
Slide 2: Large-file Distribution
• Increasing demand for large files • Movies or software release • Files are 100MB ~ tens of GB • One-to-many downloads How to serve large files to many clients?
" Content Distribution Network(CDN)? " Peer-to-peer system? " On-line movie/ downloads " Linux distribution
KyoungSoo Park
2
Slide 3: What CDNs Are Optimized For
Most Web files are small (1KB ~ 100KB)
KyoungSoo Park 3
Slide 4: Why Not Web CDNs?
• Whole file caching in participating proxy
" Optimized for 10KB objects " 2GB = 200,000 x 10KB
• Memory pressure
" Working sets do not fit in memory " Disk access is 1000 times slower
• Waste of resources
" More servers needed " Provisioning is a must
KyoungSoo Park 4
Slide 5: Peer-to-Peer?
• BitTorrent takes up ~30% Internet BW
up down
1. 2. 3. 4. peers
torrent
tracker
Download a “torrent” file Contact the tracker Enter the “swarm” network Chunk exchange policy - Rarest chunk first or random - Tit-for-tat: incentive to upload - Optimistic unchoking 5. Validate the checksums
5
Benefit: extremely good use of resources!
KyoungSoo Park
Slide 6: Peer-to-Peer?
• Custom software
" Deployment is a must " Configurations needed
• Companies may want managed service
" Handles flash crowds " Handles long-lived objects
• Performance problem
" Hard to guarantee the service quality " Others are discussed later
KyoungSoo Park 6
Slide 7: What We’d Like Is
Large-file service with No custom client No custom server No prepositioning No rehosting No manual provisoning
KyoungSoo Park 7
Slide 8: CoBlitz: Scalable Large-file CDN
• Reducing the problem to small-file CDN
" " " " Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment
• Benefits
" Faster than BitTorrent by 55-86% (~500%) " One copy from origin serves 43-55 nodes " Incremental build on existing CDNs
KyoungSoo Park
8
Slide 9: How It Works
CDN = Redirector + DNS Reverse Proxy
k1
Only reverse proxy(CDN) caches the chunks!
CDN
hun k2
chunk1
CDN
chu
chunk2
HTTP RANGE QUERY
ch un
Origin Server
un
coblitz.codeen.org
chunk1
ch
c
nk
1
k2
Client
Agent
CDN
chu nk 5
chunk 3 chunk 3 chu nk 4
unk h 5
CDN
nk 4
chunk 1 chunk3
Agent
Client
chunk 5
c
chunk 5
CDN
KyoungSoo Park
chunk5
CDN
chunk4
9
chu
Slide 10: Smart Agent
• Preserves HTTP semantics • Parallel chunk requests
sliding window of “chunks” HTTP Client
waiting done done done waiting waiting done done no action waiting no action waiting no action waiting
CDN CDN
CDN
CDN CDN
KyoungSoo Park
Agent
10
Slide 11: Chunk Indexing: Consistent Hashing
Problem: How to find the node responsible for a specific chunk? … N-1 0 … X1 X3
CDN node (proxy) Xk : Chunk request Static hashing f(x) = some_f(x) % n But n is dynamic for servers - node can go down - new node can join Consistent Hashing F(x) = some_F(x) % N (N is a large but fixed number) Find a live node k, where |F(k) – F(URL) | is minimum
11
X2
KyoungSoo Park
Slide 12: Operation & Challenges
• Provides public service over 2.5 years
" http://coblitz.codeen.org:3125/URL
• Challenges
" Scalability & robustness " Peering set difference " Load to the origin server
KyoungSoo Park
12
Slide 13: Unilateral Peering
• Independent proximity-aware peering
" Pick “n” close nodes around me " Cf. BitTorrent picks “n” nodes randomly
• Motivation
" Partial network connectivity
• Internet2, CANARIE nodes • Routing disruption
" Isolated nodes
• Benefits
" No synchronized maintenance problem " Improve both scalability & robustness
KyoungSoo Park 13
Slide 14: Peering Set Difference
• No perfect clustering by design • Assumption
" Close nodes shares common peers
Both can reach Only can reach Only can reach
KyoungSoo Park 14
Slide 15: Peering Set Difference
• Highly variable App-level RTTs
" 10 x times variance than ICMP
• High rate of change in peer set • Close nodes share less than 50%
" Low cache hit " Low memory utility " Excessive load to the origin
KyoungSoo Park
15
Slide 16: Peering Set Difference
• How to fix?
" " " " Avg RTT min RTT Increase # of samples Increase # of peers Hysteresis
• Close nodes share more than 90%
KyoungSoo Park
16
Slide 17: Reducing Origin Load
• Still have peering set difference
" Critical in traffic to origin
Origin server
• Proximity-based routing
" Converge exponentially fast " 3-15% do one more hop " Implicit overlay tree Rerun hashing
• Result
" Origin load reduction by 5x
KyoungSoo Park
17
Slide 18: Scale Experiments
• Use all live PlanetLab nodes as clients
" 380~400 live nodes at any time " Simultaneous fetch of 50MB file
• Test scenarios
" " " " Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper
18
KyoungSoo Park
Slide 19: Throughput Distribution
1 0.9 Fraction of Nodes <= X (CDF) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 KyoungSoo Park 2000 4000 6000 8000 Throughput(Kbps) 10000 19
Direct BT - total BT - core In - order uncached In - order staggered In - order cached
BT-Core
55-86%
Out-of-order staggered
Slide 20: Downloading Times
1 0.9 Fraction of Nodes <= X 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Download Time (sec)
95% percentile: 1000+ secs faster
In-order cached In-order staggered In-order uncached BT-core BT-total Direct
KyoungSoo Park
20
Slide 21: Why Is BitTorrent Slow?
• In the experiments
" No locality – randomly choose peers " Chunk indexing – extra communication
• Trackerless BitTorrent – Kademlia DHT
• In practice
" Upload capacity of typical peers is low
• 10 to a few 100 Kbps for cable/DSL users
" Tit for tat may not be fair
• A few high-capacity uploaders help the most • BitTyrant[NSDI’07]
KyoungSoo Park 21
Slide 22: Synchronized Workload Congestion
Origin Server
KyoungSoo Park
22
Slide 23: Addressing Congestion
• Proximity-based multi-hop routing
" Overlay tree for each chunk
• Dynamic chunk-window resizing
" Increase by 1/log(x), (where x is win size) if chunk finishes < average " Decrease by 1 if retry kills the first chunk
KyoungSoo Park
23
Slide 24: Number of Failures
6 Failure Percentage(%) 5 4 3 2 1 0 Direct
KyoungSoo Park
5.7 4.3
2.1
BitTorrent
CoBlitz
24
Slide 25: Performance After Flash Crowds
1 0.9 Fraction of Nodes > X 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5000 10000 15000 20000 25000 30000 35000 Throughput(Kbps) BitTorrent
CoBlitz:70+% > 5Mbps
In-order CoBlitz
BitTorrent: 20% > 5Mbps
KyoungSoo Park
25
Slide 26: Data Reuse
7 fetches for 400 nodes, 98% cache hit
60 Utility (# of nodes served / copy) 50 40 30 20 10 0 Shark
KyoungSoo Park
55
35
7.7
BitTorrent
CoBlitz
26
Slide 27: Real-world Usage
• 1-2 Terabytes/day • Fedora Core official mirror
" US-East/West, England, Germany, Korea, Japan
• • • • •
CiteSeer repository (50,000+ links) University Channel (podcast/video) Public lecture distribution by PU OIT Popular game patch distribution PlanetLab researchers
" Stork(U of Arizona) + ~10 others
27
KyoungSoo Park
Slide 28: Fedora Core 6 Release
• October 24th, 2006 • Peak Throughput 1.44Gbps
Release point 10am
1G
Origin Server 30-40Mbps
KyoungSoo Park 28
Slide 29: On Fedora Core Mirror List
• Many people complained about I/O
" Performing peak 500Mbps out of 2Gbps
• 2 Sun x4200 w/Dual Operons, 2G mem • 2.5 TB Sata-based SAN • All ISOs in disk cache or in-memoy FS
• CoBlitz uses 100MB mem per node
" Many PL node disks are IDEs " Most nodes are BW capped at 10Mpbs
KyoungSoo Park 29
Slide 30: Conclusion
• Scalable large-file transfer service • Evolution under real traffic
" Up and running 24/7 for over 2.5 years " Unilateral peering, multi-hop routing, window size adjustment
• Better performance than P2P
" Better throughput, download time " Far less origin traffic
KyoungSoo Park 30
Slide 31: Thank you!
More information: http://codeen.cs.princeton.edu/coblitz/ How to use: http://coblitz.codeen.org:3125/URL*
*Some content restrictions apply See Web site for details Contact me if you want full access!
KyoungSoo Park 31