Another marvelous performance tuning tricks in Python

Hi Guys!

Today, I’ll be showing another post on how one can drastically improve the performance of a python code. Last time, we took advantage of vector computing by using GPU-based computation. This time we’ll explore PyPy (the new just in time compiler, while Python is the interpreter).


What is PyPy?

According to the standard description available over the net ->

PyPy is a very compliant Python interpreter that is a worthy alternative to CPython. By installing and running your application with it, you can gain noticeable speed improvements. How much of an improvement you’ll see depends on the application you’re running.

What is JIT (Just-In Time) compiler?

A compiled programming language always faster in execution as it generates the bytecode based on the CPU architecture & OS. However, they are challenging to port into another system. Example: C, C++ etc.

Interpreted languages are easy to port into a new system. However, they lack performance. Example: Perl, Matlab, etc.

However, python falls between the two. Hence, it performs better than purely interpreted languages. But, indeed not as good as compiler-driven language.

There is a new Just in time compiler comes, which takes advantage of both the world. It identifies the repeatable code & converts those chunks into machine learning code for optimum performance.


To prepare the environment, you need to install the following in MAC (I’m using MacBook) –

brew install pypy3

Let’s revisit our code.

Step 1: largeCompute.py (The main script, which will participate in a performance for both the interpreter):


##############################################
#### Written By: SATYAKI DE ####
#### Written On: 06-May-2021 ####
#### ####
#### Objective: Main calling scripts for ####
#### normal execution. ####
##############################################
from timeit import default_timer as timer
def vecCompute(sizeNum):
try:
total = 0
for i in range(1, sizeNum):
for j in range(1, sizeNum):
total += i + j
return total
except Excception as e:
x = str(e)
print('Error: ', x)
return 0
def main():
start = timer()
totalM = 0
totalM = vecCompute(100000)
print('The result is : ' + str(totalM))
duration = timer() start
print('It took ' + str(duration) +' seconds to compute')
if __name__ == '__main__':
main()

view raw

largeCompute.py

hosted with ❤ by GitHub

Key snippets from the above script –

for i in range(1, sizeNum):
            for j in range(1, sizeNum):
                total += i + j

vecCompute function calculates 100000 * 100000 or any new supplied number to process the value (I = I + J) of each iteration.


Let’s see how it performs.

To run the commands in pypy you need to use the following command –

pypy largeCompute.py

or, You have to mention the specific path as follows –

/Users/satyaki_de/Desktop/pypy3.7-v7.3.4-osx64/bin/pypy largeCompute.py
Performance Comparison between two interpreters

As you can see there is a significant performance improvement i.e. (352.079 / 14.503) = 24.276. So, I can clearly say 24 times faster than using the standard python interpreter. This is as good as C++ code.


Where not to use?

PyPy works best with the pure python-driven applications. It can’t work with the Python or any C extension in python. Hence, you won’t get that benefits. However, I have a strong believe that one day we may use this for most of our use cases.

For more information, please visit this link. So, this is another shortest yet effective post. 🙂


So, finally, we have done it.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.

Performance improvement of Python application programming

Hello guys,

Today, I’ll be demonstrating a short but significant topic. There are widespread facts that, on many occasions, Python is relatively slower than other strongly typed programming languages like C++, Java, or even the latest version of PHP.

I found a relatively old post with a comparison shown between Python and the other popular languages. You can find the details at this link.

However, I haven’t verified the outcome. So, I can’t comment on the final statistics provided on that link.

My purpose is to find cases where I can take certain tricks to improve performance drastically.

One preferable option would be the use of Cython. That involves the middle ground between C & Python & brings the best out of both worlds.

The other option would be the use of GPU for vector computations. That would drastically increase the processing power. Today, we’ll be exploring this option.

Let’s find out what we need to prepare our environment before we try out on this.

Step – 1 (Installing dependent packages):

pip install pyopencl
pip install plaidml-keras

So, we will be taking advantage of the Keras package to use our GPU. And, the screen should look like this –

Installation Process of Python-based Packages

Once we’ve installed the packages, we’ll configure the package showing on the next screen.

Configuration of Packages

For our case, we need to install pandas as we’ll be using numpy, which comes default with it.

Installation of supplemental packages

Let’s explore our standard snippet to test this use case.

Case 1 (Normal computational code in Python):

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 18-Jan-2020              ####
####                                      ####
#### Objective: Main calling scripts for  ####
#### normal execution.                    ####
##############################################

import numpy as np
from timeit import default_timer as timer

def pow(a, b, c):
    for i in range(a.size):
         c[i] = a[i] ** b[i]

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    pow(a, b, c)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()

Case 2 (GPU-based computational code in Python):

#################################################
#### Written By: SATYAKI DE                  ####
#### Written On: 18-Jan-2020                 ####
####                                         ####
#### Objective: Main calling scripts for     ####
#### use of GPU to speed-up the performance. ####
#################################################

import numpy as np
from timeit import default_timer as timer

# Adding GPU Instance
from os import environ
environ["KERAS_BACKEND"] = "plaidml.keras.backend"

def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()

And, here comes the output for your comparisons –

Case 1 Vs Case 2:

Performance Comparisons

As you can see, there is a significant improvement that we can achieve using this. However, it has limited scope. Not everywhere you get the benefits. Until or unless Python decides to work on the performance side, you better need to explore either of the two options that I’ve discussed here (I didn’t mention a lot on Cython here. Maybe some other day.).

To get the codebase you can refer the following Github link.


So, finally, we have done it.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.