Hacking the performance of Python Solutions with a custom-built library

Today, I’m very excited to demonstrate an effortless & new way to hack the performance of Python. This post will be a super short & yet crisp presentation of improving the overall performance.

Why not view the demo before going through it?


Demo

Isn’t it exciting? Let’s understand the steps to improve your code.

pip install cython

Cython is a Python-to-C compiler. It can significantly improve performance for specific tasks, especially those with heavy computation and loops. Also, Cython’s syntax is very similar to Python, which makes it easy to learn.

Let’s consider an example where we calculate the sum of squares for a list of numbers. The code without optimization would look like this:

  • perfTest_1.py (First untuned Python class.)
#########################################################
#### Written By: SATYAKI DE                          ####
#### Written On: 31-Jul-2023                         ####
#### Modified On 31-Jul-2023                         ####
####                                                 ####
#### Objective: This is the main calling             ####
#### python script that will invoke the              ####
#### first version of accute computation.            ####
####                                                 ####
#########################################################
from clsConfigClient import clsConfigClient as cf

import time
start = time.time()

n_val = cf.conf['INPUT_VAL']

def compute_sum_of_squares(n):
    return sum([i**2 for i in range(n)])

n = n_val

print(compute_sum_of_squares(n))

print(f"Test - 1: Execution time: {time.time() - start} seconds")

Here, n_val contains the value as – “1000000000”.

Now, let’s optimize it using Cython by installing the abovementioned packages. Then, you will have to create a .pyx file, say “compute.pyx”, with the following code:

cpdef double compute_sum_of_squares(int n):
    return sum([i**2 for i in range(n)])

Now, create a setup.py file to compile it:

###########################################################
#### Written By: SATYAKI DE                            ####
#### Written On: 31-Jul-2023                           ####
#### Modified On 31-Jul-2023                           ####
####                                                   ####
#### Objective: This is the main calling               ####
#### python script that will create the                ####
#### compiled library after executing the compute.pyx. ####
####                                                   ####
###########################################################

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("compute.pyx")
)

Compile it using the command:

python setup.py build_ext --inplace

This will look like the following –

Finally, you can import the function from the compiled “.pyx” file inside the improved code.

  • perfTest_2.py (First untuned Python class.)
#########################################################
#### Written By: SATYAKI DE                          ####
#### Written On: 31-Jul-2023                         ####
#### Modified On 31-Jul-2023                         ####
####                                                 ####
#### Objective: This is the main calling             ####
#### python script that will invoke the              ####
#### optimized & precompiled custom library, which   ####
#### will significantly improve the performance.     ####
####                                                 ####
#########################################################
from clsConfigClient import clsConfigClient as cf
from compute import compute_sum_of_squares

import time
start = time.time()

n_val = cf.conf['INPUT_VAL']

n = n_val

print(compute_sum_of_squares(n))

print(f"Test - 2: Execution time with multiprocessing: {time.time() - start} seconds")

By compiling to C, Cython can speed up loop and function calls, leading to significant speedup for CPU-bound tasks.

Please note that while Cython can dramatically improve performance, it can make the code more complex and harder to debug. Therefore, starting with regular Python and switching to Cython for the performance-critical parts of the code is recommended.


So, finally, we’ve done it. I know that this post is relatively smaller than my earlier post. But, I think, you can get a good hack to improve some of your long-running jobs by applying this trick.

I’ll bring some more exciting topics in the coming days from the Python verse. Please share & subscribe to my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Python performance improvement with 3.11 Version

Today, we’ll share another performance improvement incorporating the latest Python 3.11 version. You can consider this significant advancement over the past versions. Last time, I posted for 3.7 in one of my earlier posts. But, we should diligently update everyone regarding the performance upgrade as it is slowly catching up with some of the finest programming languages.

But, before that, I want to share the latest stats of the machine where I tried these tests (As there is a change of system compared to last time).


Let us explore the base code –

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 06-May-2021              ####
#### Modified On: 30-Oct-2022             ####
####                                      ####
#### Objective: Main calling scripts for  ####
#### normal execution.                    ####
##############################################

from timeit import default_timer as timer

def vecCompute(sizeNum):
    try:
        total = 0
        for i in range(1, sizeNum):
            for j in range(1, sizeNum):
                total += i + j
        return total
    except Excception as e:
        x = str(e)
        print('Error: ', x)

        return 0


def main():

    start = timer()

    totalM = 0
    totalM = vecCompute(100000)

    print('The result is : ' + str(totalM))
    duration = timer() - start
    print('It took ' + str(duration) + ' seconds to compute')

if __name__ == '__main__':
    main()

And here is the outcome comparison between the 3.10 & 3.11 –

The above screenshot shows an improvement of 23% on an average compared to the previous version.

These performance stats are highly crucial. The result shows how Python is slowly emerging as the universal language for various kinds of work and is now targetting one of the vital threads, i.e., improvement of performance.


So, finally, we have done it.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.

Memory profiler in Python

Today, I’ll be discussing a short but critical python topic. That is capturing the performance matrix by analyzing the memory profiling.

We’ll take any ordinary scripts & then use this package to analyze them.

But, before we start, why don’t we see the demo & then go through it?

Demo

Isn’t exciting? Let us understand in details.

For this, we’ve used the following package –

pip install memory-profiler


How you can run this?

All you have to do is to modify your existing python function & add this “profile” keyword. And this will open a brand new information shop for you.

#####################################################
#### Written By: SATYAKI DE                      ####
#### Written On: 22-Jul-2022                     ####
#### Modified On 30-Aug-2022                     ####
####                                             ####
#### Objective: This is the main calling         ####
#### python script that will invoke the          ####
#### clsReadForm class to initiate               ####
#### the reading capability in real-time         ####
#### & display text from a formatted forms.      ####
#####################################################

# We keep the setup code in a different class as shown below.
import clsReadForm as rf

from clsConfig import clsConfig as cf

import datetime
import logging

###############################################
###           Global Section                ###
###############################################
# Instantiating all the main class

x1 = rf.clsReadForm()

###############################################
###    End of Global Section                ###
###############################################
@profile
def main():
    try:
        # Other useful variables
        debugInd = 'Y'
        var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        var1 = datetime.datetime.now()

        print('Start Time: ', str(var))
        # End of useful variables

        # Initiating Log Class
        general_log_path = str(cf.conf['LOG_PATH'])

        # Enabling Logging Info
        logging.basicConfig(filename=general_log_path + 'readingForm.log', level=logging.INFO)

        print('Started extracting text from formatted forms!')

        # Execute all the pass
        r1 = x1.startProcess(debugInd, var)

        if (r1 == 0):
            print('Successfully extracted text from the formatted forms!')
        else:
            print('Failed to extract the text from the formatted forms!')

        var2 = datetime.datetime.now()

        c = var2 - var1
        minutes = c.total_seconds() / 60
        print('Total difference in minutes: ', str(minutes))

        print('End Time: ', str(var1))

    except Exception as e:
        x = str(e)
        print('Error: ', x)

if __name__ == "__main__":
    main()

Let us analyze the code. As you can see that, we’ve converted a normal python main function & mar it as @profile.

The next step is to run the following command –

python -m memory_profiler readingForm.py

This will trigger the script & it will collect all the memory information against individual lines & display it as shown in the demo.

I think this will give all the python developer a great insight about their quality of the code, which they have developed. To know more on this you can visit the following link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Another marvelous performance tuning tricks in Python

Hi Guys!

Today, I’ll be showing another post on how one can drastically improve the performance of a python code. Last time, we took advantage of vector computing by using GPU-based computation. This time we’ll explore PyPy (the new just in time compiler, while Python is the interpreter).


What is PyPy?

According to the standard description available over the net ->

PyPy is a very compliant Python interpreter that is a worthy alternative to CPython. By installing and running your application with it, you can gain noticeable speed improvements. How much of an improvement you’ll see depends on the application you’re running.

What is JIT (Just-In Time) compiler?

A compiled programming language always faster in execution as it generates the bytecode based on the CPU architecture & OS. However, they are challenging to port into another system. Example: C, C++ etc.

Interpreted languages are easy to port into a new system. However, they lack performance. Example: Perl, Matlab, etc.

However, python falls between the two. Hence, it performs better than purely interpreted languages. But, indeed not as good as compiler-driven language.

There is a new Just in time compiler comes, which takes advantage of both the world. It identifies the repeatable code & converts those chunks into machine learning code for optimum performance.


To prepare the environment, you need to install the following in MAC (I’m using MacBook) –

brew install pypy3

Let’s revisit our code.

Step 1: largeCompute.py (The main script, which will participate in a performance for both the interpreter):


##############################################
#### Written By: SATYAKI DE ####
#### Written On: 06-May-2021 ####
#### ####
#### Objective: Main calling scripts for ####
#### normal execution. ####
##############################################
from timeit import default_timer as timer
def vecCompute(sizeNum):
try:
total = 0
for i in range(1, sizeNum):
for j in range(1, sizeNum):
total += i + j
return total
except Excception as e:
x = str(e)
print('Error: ', x)
return 0
def main():
start = timer()
totalM = 0
totalM = vecCompute(100000)
print('The result is : ' + str(totalM))
duration = timer() start
print('It took ' + str(duration) +' seconds to compute')
if __name__ == '__main__':
main()

view raw

largeCompute.py

hosted with ❤ by GitHub

Key snippets from the above script –

for i in range(1, sizeNum):
            for j in range(1, sizeNum):
                total += i + j

vecCompute function calculates 100000 * 100000 or any new supplied number to process the value (I = I + J) of each iteration.


Let’s see how it performs.

To run the commands in pypy you need to use the following command –

pypy largeCompute.py

or, You have to mention the specific path as follows –

/Users/satyaki_de/Desktop/pypy3.7-v7.3.4-osx64/bin/pypy largeCompute.py
Performance Comparison between two interpreters

As you can see there is a significant performance improvement i.e. (352.079 / 14.503) = 24.276. So, I can clearly say 24 times faster than using the standard python interpreter. This is as good as C++ code.


Where not to use?

PyPy works best with the pure python-driven applications. It can’t work with the Python or any C extension in python. Hence, you won’t get that benefits. However, I have a strong believe that one day we may use this for most of our use cases.

For more information, please visit this link. So, this is another shortest yet effective post. 🙂


So, finally, we have done it.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.

Performance improvement of Python application programming

Hello guys,

Today, I’ll be demonstrating a short but significant topic. There are widespread facts that, on many occasions, Python is relatively slower than other strongly typed programming languages like C++, Java, or even the latest version of PHP.

I found a relatively old post with a comparison shown between Python and the other popular languages. You can find the details at this link.

However, I haven’t verified the outcome. So, I can’t comment on the final statistics provided on that link.

My purpose is to find cases where I can take certain tricks to improve performance drastically.

One preferable option would be the use of Cython. That involves the middle ground between C & Python & brings the best out of both worlds.

The other option would be the use of GPU for vector computations. That would drastically increase the processing power. Today, we’ll be exploring this option.

Let’s find out what we need to prepare our environment before we try out on this.

Step – 1 (Installing dependent packages):

pip install pyopencl
pip install plaidml-keras

So, we will be taking advantage of the Keras package to use our GPU. And, the screen should look like this –

Installation Process of Python-based Packages

Once we’ve installed the packages, we’ll configure the package showing on the next screen.

Configuration of Packages

For our case, we need to install pandas as we’ll be using numpy, which comes default with it.

Installation of supplemental packages

Let’s explore our standard snippet to test this use case.

Case 1 (Normal computational code in Python):

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 18-Jan-2020              ####
####                                      ####
#### Objective: Main calling scripts for  ####
#### normal execution.                    ####
##############################################

import numpy as np
from timeit import default_timer as timer

def pow(a, b, c):
    for i in range(a.size):
         c[i] = a[i] ** b[i]

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    pow(a, b, c)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()

Case 2 (GPU-based computational code in Python):

#################################################
#### Written By: SATYAKI DE                  ####
#### Written On: 18-Jan-2020                 ####
####                                         ####
#### Objective: Main calling scripts for     ####
#### use of GPU to speed-up the performance. ####
#################################################

import numpy as np
from timeit import default_timer as timer

# Adding GPU Instance
from os import environ
environ["KERAS_BACKEND"] = "plaidml.keras.backend"

def pow(a, b):
    return a ** b

def main():
    vec_size = 100000000

    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
    c = np.zeros(vec_size, dtype=np.float32)

    start = timer()
    c = pow(a, b)
    duration = timer() - start

    print(duration)

if __name__ == '__main__':
    main()

And, here comes the output for your comparisons –

Case 1 Vs Case 2:

Performance Comparisons

As you can see, there is a significant improvement that we can achieve using this. However, it has limited scope. Not everywhere you get the benefits. Until or unless Python decides to work on the performance side, you better need to explore either of the two options that I’ve discussed here (I didn’t mention a lot on Cython here. Maybe some other day.).

To get the codebase you can refer the following Github link.


So, finally, we have done it.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.