False Flag Cyber Attack is the most current and hottest topic people are searching for everyone want to see the deatils about false flag cyber attack there are so many people who aware of cyber attack but false flag cyber attack this is new so what (more)
False Flag Cyber Attack is the most current and hottest topic people are searching for everyone want to see the deatils about false flag cyber attack there are so many people who aware of cyber attack but false flag cyber attack this is new so what is false flag and why this topic is buzzing around all over the net is it some kind of threat and there are so many rumours buzzing around on the net that it can take down the whole world internet so lets talk more about it and found the reason. (less)
From:
anon-390810
Views: 348
Comments: 0
ATHEISM IS FALSE Richard Dawkins And The Improbability Of God Delusion ,redhat library search path, free ebooks downloads, identify painting maps books men library, public libraries in new hampshire
Slide 1: High Performance Computing with Python (4 hour tutorial)
EuroPython 2011
Ian@IanOzsvald.com - EuroPy 2011
Slide 2: Goal
• Get you writing faster code for CPU-bound problems using Python • Your task is probably in pure Python, is CPU bound and can be parallelised (right?) • We're not looking at network-bound problems • Profiling + Tools == Speed
Ian@IanOzsvald.com - EuroPy 2011
Slide 3: Get the source please!
• • • •
http://tinyurl.com/europyhpc (original: http://ianozsvald.com/wp-content/hpc_tutoria ) google: “github ianozsvald”, get HPC full source (but you can do this after!)
Ian@IanOzsvald.com - EuroPy 2011
Slide 4: About me (Ian Ozsvald)
• • • • • • • A.I. researcher in industry for 12 years C, C++, (some) Java, Python for 8 years Demo'd pyCUDA and Headroid last year Lecturer on A.I. at Sussex Uni (a bit) ShowMeDo.com co-founder Python teacher, BrightonPy co-founder IanOzsvald.com - MorConsulting.com
Ian@IanOzsvald.com - EuroPy 2011
Slide 5: Overview (pre-requisites)
• • • • • • • cProfile, line_profiler, runsnake numpy Cython and ShedSkin multiprocessing ParallelPython PyPy pyCUDA
Ian@IanOzsvald.com - EuroPy 2011
Slide 6: We won't be looking at...
• • • • • • • Algorithmic choices, clusters or cloud Gnumpy (numpy->GPU) Theano (numpy(ish)->CPU/GPU) CopperHead (numpy(ish)->GPU) BottleNeck (Cython'd numpy) Map/Reduce pyOpenCL
Ian@IanOzsvald.com - EuroPy 2011
Slide 7: Something to consider
• “Proebsting's Law” • http://research.microsoft.com/en-us/um/people • Compiler advances (generally) unhelpful (sort-of – consider auto vectorisation!) • Multi-core common • Very-parallel (CUDA, OpenCL, MS AMP, APUs) should be considered
Ian@IanOzsvald.com - EuroPy 2011
Slide 8: What can we expect?
•
Close to C speeds (shootout): – http://attractivechaos.github.com/plb/ – http://shootout.alioth.debian.org/u32/which • Depends on how much work you put in • nbody JavaScript much faster than Python but we can catch it/beat it (and get close to C speed)
Ian@IanOzsvald.com - EuroPy 2011
Slide 9: Practical result - PANalytical
250 234
200 167 150 Seconds 126 100 90
81
50 20 0 Numpy+Py +More Numpy +'if' added! Numpy+Py+Cy All +Multiprocessing
Ian@IanOzsvald.com - EuroPy 2011
Slide 10: Mandelbrot results (Desktop i3)
40 35 30 25 Seconds 20 15 10 5 0 0.3 0.3 10 10 3.5 0.07 PyPy 1.5 NumExpr ShedSkin pyCUDA C Py 2.7 Numpy v. Cython np. pyCUDA py ParallelPython 9 35 36
Ian@IanOzsvald.com - EuroPy 2011
Slide 11: Our code
• • • • •
•
pure_python.py numpy_vector.py pure_python.py 1000 1000 # RUN Our two building blocks Google “github ianozsvald” -> EuroPython2011_HighPerformanceCom puting https://github.com/ianozsvald/EuroPython20
Ian@IanOzsvald.com - EuroPy 2011
Slide 12: Profiling bottlenecks
• • • • python -m cProfile -o rep.prof pure_python.py 1000 1000 import pstats p = pstats.Stats('rep.prof') p.sort_stats('cumulative').pri nt_stats(10)
Ian@IanOzsvald.com - EuroPy 2011
Slide 13: cProfile output
51923594 function calls (51923523 primitive calls) in 74.301 seconds ncalls 1 1 1 {abs} 51,414,419 12.465 ... 0.000 12.465 0.000 tottime 0.034 0.273 57.168 percall 0.034 0.273 57.168 cumtime 74.303 74.268 73.580 percall 74.303 74.268 73.580 pure_python.py:1(<module>) pure_python.py:23(calc_pure_python) pure_python.py:9(calculate_z_serial_purepython)
Ian@IanOzsvald.com - EuroPy 2011
Slide 14: RunSnakeRun
Ian@IanOzsvald.com - EuroPy 2011
Slide 15: Let's profile python.py
• • • python -m cProfile -o res.prof pure_python.py 1000 1000 runsnake res.prof Let's look at the result
Ian@IanOzsvald.com - EuroPy 2011
Slide 16: What's the problem?
• • • What's really slow? Useful from a high level... We want a line profiler!
Ian@IanOzsvald.com - EuroPy 2011
Slide 17: line_profiler.py
• kernprof.py -l -v pure_python_lineprofiler.py 1000 1000 Warning...slow! We might want to use 300 100
•
Ian@IanOzsvald.com - EuroPy 2011
Slide 18: kernprof.py output
...% Time Line Contents @profile def calculate_z_serial_purepython(q, maxiter, z): 0.0 1.1 27.8 35.8 31.9 output = [0] * len(q) for i in range(len(q)): for iteration in range(maxiter): z[i] = z[i]*z[i] + q[i] if abs(z[i]) > 2.0: =====================
Ian@IanOzsvald.com - EuroPy 2011
Slide 19: Dereferencing is slow
• • • • • • Dereferencing involves lookups – slow Our 'i' changes slowly zi = z[i]; qi = q[i] # DO IT Change all z[i] and q[i] references Run kernprof again Is it cheaper?
Ian@IanOzsvald.com - EuroPy 2011
Slide 20: We have faster code
• • pure_python_2.py is faster, we'll use this as the basis for the next steps There are tricks:
– – – – – sets over lists if possible use dict[] rather than dict.get() build-in sort is fast list comprehensions map rather than loops
Ian@IanOzsvald.com - EuroPy 2011
Slide 21: PyPy 1.5
• • • • • • Confession – I'm a newbie Probably cool tricks to learn pypy pure_python_2.py 1000 1000 PIL support, numpy isn't My (bad) code needs numpy for display (maybe you can fix that?) pypy -m cProfile -o runpypy.prof pure_python_2.py Ian@IanOzsvald.com - EuroPy 2011
Slide 22: Cython
• • • • • • Manually add types, converts to C .pyx files (built on Pyrex) Win/Mac/Lin with gcc, msvc etc 10-100* speed-up numpy integration http://cython.org/
Ian@IanOzsvald.com - EuroPy 2011
Slide 23: Cython on pure_python_2.py
• • • • • • # ./cython_pure_python Make calculate_z.py, test it works Turn calculate_z.py to .pyx Add setup.py (see Getting Started doc) python setup.py build_ext --inplace cython -a calculate_z.pyx to get profiling Ian@IanOzsvald.com - EuroPy 2011 feedback (.html)
Slide 24: Cython types
• Help Cython by adding annotations:
– – – – list q z int unsigned int # hint no negative indices with for loop complex and complex double
•
How much faster?
Ian@IanOzsvald.com - EuroPy 2011
Slide 25: Compiler directives
• •
http://wiki.cython.org/enhancements/compile We can go faster (maybe):
– – #cython: boundscheck=False #cython: wraparound=False #cython: profile=True
• • •
Profiling:
–
Check profiling works Show _2_bettermath # FAST!
Ian@IanOzsvald.com - EuroPy 2011
Slide 26: ShedSkin
• • • • • http://code.google.com/p/shedskin/ Auto-converts Python to C++ (auto type inference) Can only import modules that have been implemented No numpy, PIL etc but great for writing new fast modules 3000 SLOC 'limit', always improving
Ian@IanOzsvald.com - EuroPy 2011
Slide 27: Easy to use
• • • • • • • # ./shedskin/ shedskin shedskin1.py make ./shedskin1 1000 1000 shedskin shedskin2.py; make ./shedskin2 1000 1000 # FAST! No easy profiling, complex is slow (for now)
Ian@IanOzsvald.com - EuroPy 2011
Slide 28: numpy vectors
• • • • • http://numpy.scipy.org/ Vectors not brilliantly suited to Mandelbrot (but we'll ignore that...) numpy is very-parallel for CPUs a = numpy.array([1,2,3,4]) a *= 3 ->
– numpy.array([3,6,9,12])
Ian@IanOzsvald.com - EuroPy 2011
Slide 29: Vector outline...
# ./numpy_vector/numpy_vector.py for iteration... z = z*z + q done = np.greater(abs(z), 2.0) q = np.where(done,0+0j, q) z = np.where(done,0+0j, z) output = np.where(done, iteration, output)
Ian@IanOzsvald.com - EuroPy 2011
Slide 30: Profiling some more
• • • • • python numpy_vector.py 1000 1000 kernprof.py -l -v numpy_vector.py 300 100 How could we break out early? How big is 250,000 complex numbers? # .nbytes, .size
Ian@IanOzsvald.com - EuroPy 2011
Slide 31: Cache sizes
• • • • Modern CPUs have 2-6MB caches Tuning is hard (and may not be worthwhile) Heuristic: Either keep it tiny (<64KB) or worry about really big data sets (>20MB) # numpy_vector_2.py
Ian@IanOzsvald.com - EuroPy 2011
Slide 32: Speed vs cache size (Core2/i3)
200 180 160 140 120 Seconds 100 80 60 40 20 0 250k 90k 50k 45k 20k 10k 5k 1k 100 54 52 62 45 45 42 43 45 180
Ian@IanOzsvald.com - EuroPy 2011
Slide 33: NumExpr
• • • • • • http://code.google.com/p/numexpr/ This is magic With Intel MKL it goes even faster # ./numpy_vector_numexpr/ python numpy_vector_numexpr.py 1000 1000 Now convert your numpy_vector.py
Ian@IanOzsvald.com - EuroPy 2011
Slide 34: numpy and iteration
• • • • • Normally there's no point using numpy if we aren't using vector operations python numpy_loop.py 1000 1000 Is it any faster? Let's run kernprof.py on this and the earlier pure_python_2.py Any significant differences?
Ian@IanOzsvald.com - EuroPy 2011
Slide 35: Cython on numpy_loop.py
• • • •
•
Can low-level C give us a speed-up over vectorised C? # ./cython_numpy_loop/ http://docs.cython.org/src/tutorial/numpy.htm Your task – make .pyx, start without types, make it work from numpy_loop.py Add basic types, use cython -a
Ian@IanOzsvald.com - EuroPy 2011
Slide 36: multiprocessing
• • • •
Using all our CPUs is cool, 4 are common, 8 will be common Global Interpreter Lock (isn't our enemy) Silo'd processes are easiest to parallelise http://docs.python.org/library/multiprocessing
Ian@IanOzsvald.com - EuroPy 2011
Slide 37: multiprocessing Pool
• • • • • # ./multiprocessing/multi.py p = multiprocessing.Pool() po = p.map_async(fn, args) result = po.get() # for all po objects join the result items to make full result
Ian@IanOzsvald.com - EuroPy 2011
Slide 38: Making chunks of work
• • • • Split the work into chunks (follow my code) Splitting by number of CPUs is good Submit the jobs with map_async Get the results back, join the lists
Ian@IanOzsvald.com - EuroPy 2011
Slide 39: Code outline
• Copy my chunk code output = [] for chunk in chunks: out = calc...(chunk) output += out
Ian@IanOzsvald.com - EuroPy 2011
Slide 40: ParallelPython
• • • • • Same principle as multiprocessing but allows >1 machine with >1 CPU http://www.parallelpython.com/ Seems to work poorly with lots of data (e.g. 8MB split into 4 lists...!) We can run it locally, run it locally via ppserver.py and run it remotely too Can we demo it to another machine?
Ian@IanOzsvald.com - EuroPy 2011
Slide 41: ParallelPython + binaries
• We can ask it to use modules, other functions and our own compiled modules Works for Cython and ShedSkin Modules have to be in PYTHONPATH (or current directory for ppserver.py) parallelpython_cython_pure_pyt hon
Ian@IanOzsvald.com - EuroPy 2011
• • •
Slide 42: Challenge...
• • • • Can we send binaries (.so/.pyd) automatically? It looks like we could We'd then avoid having to deploy to remote machines ahead of time... Anybody want to help me?
Ian@IanOzsvald.com - EuroPy 2011
Slide 43: pyCUDA
• • • • NVIDIA's CUDA -> Python wrapper http://mathema.tician.de/software/pycuda Can be a pain to install... Has numpy-like interface and two lower level C interfaces
Ian@IanOzsvald.com - EuroPy 2011
Slide 44: pyCUDA demos
• • • • • • # ./pyCUDA/ I'm using float32/complex64 as my CUDA card is too old :-( (Compute 1.3) numpy-like interface is easy but slow elementwise requires C thinking sourcemodule gives you complete control Great for prototyping and moving to C
Ian@IanOzsvald.com - EuroPy 2011
Slide 45: Birds of Feather?
• • • • numpy is cool but CPU bound pyCUDA is cool and is numpy-like Could we monkey patch numpy to autorun CUDA(/openCL) if a card is present? Anyone want to chat about this?
Ian@IanOzsvald.com - EuroPy 2011
Slide 46: Future trends
• • • • • • multi-core is obvious CUDA-like systems are inevitable write-once, deploy to many targets – that would be lovely Cython+ShedSkin could be cool Parallel Cython could be cool Refactoring with rope is definitely cool
Ian@IanOzsvald.com - EuroPy 2011
Slide 47: Bits to consider
• • • • • • Cython being wired into Python (GSoC) CorePy assembly -> numpy http://numcorepy.blogspot.com/ PyPy advancing nicely GPUs being interwoven with CPUs (APU) numpy+NumExpr->GPU/CPU mix? Learning how to massively parallelise is the key Ian@IanOzsvald.com - EuroPy 2011
Slide 48: Feedback
• • • • I plan to write this up I want feedback (and maybe a testimonial if you found this helpful?) ian@ianozsvald.com Thank you :-)
Ian@IanOzsvald.com - EuroPy 2011