Ever wondered how to harness the performance of C within the landscape of Python? The fusion of C's efficiency with Python's user-friendly syntax opens up a realm of possibilities, from optimizing complex operations to crafting high-performance libraries. In this article, I'll take you on a guided expedition through my exploration of using C code within Python programs. Rather than simply presenting solutions, I'll share the steps, challenges, and insights I encountered along the way.
The concept of using C code within Python programs is not new and has been in practice for many years now. The idea behind using C code in Python is to gain access to the fast performance of C while still being able to leverage the high-level functionality of Python. Python libraries like Numpy, Scipy, and Pandas use C code to speed up their operations.
This article details my journey from complete novice to beginner in using C libraries within python programs.
After some light searching, I found that there are two ways to access C programs from Python:
- ctypes
- cython
Today, I will be exploring ctypes (because it sounded easier) by writing common programs in both C and Python and making them interact or at least in this case, making Python access the C code.
💡 Important to note that there might be some peculiarities following these steps on a windows machine, but with a few adaptations, you should be able to follow along.
Prerequisites
I already had both the C compiler and the Python interpreter installed. If you are following along with this article, then you need to check if you have them installed as well.
Easiest way to check if one has the C compiler installed is to run:
gcc --version # Apple clang version 14.0.3 (clang-1403.0.22.14.1) ...
The above command should show information about the installed c compiler.
For Python:
python --version # Python 3.11.4
Writing a “hello world” program in c
I will begin by writing the customary hello world program in C. The aim of writing this program would be to execute it from within the python environment.
#include <stdio.h>
void hello(void) {
println("Hello from inisde the DILL \\n")
}
I tried to compile this program by running gcc -o hello.o hello.c
but I ran into an unfamiliar error:
Undefined symbols for architecture arm64:
"_main", referenced from:
implicit entry/start for main executable
ld: symbol(s) not found for architecture arm64
Turns out I can’t just do whatever I want while writing C programs 😅. The short explanation for this error is that something called the “linker” tried to create an executable out of this file. The linker typically needs a main
function to exist in order to do that. To skip the linking stage, I had to add the -c
flag.
What I thought was the final command looked something like this: gcc -c -o hello.o hello.c
However, the above compiled code was not very useful to me because I needed something called a shared library to make the C code reusable as a separate module. In order to create a shared library, I simply needed to compile my code differently.
The updated compile code would be gcc -shared -o hello.so hello.c
Executing the C code from within the python REPL
The python REPL(Read-Eval-Print Loop) can be accessed on any machine where Python is installed by simply typing python
. This launches an interactive environment where Python code can be evaluated.
First I attempted to load the library by typing the following:
import ctypes
ctypes.cdll.LoadLibrary("hello.so") # <CDLL 'hello.so', handle 8770d260 at 0x118e79650>
💡 Notice I did not preface the “hello.so” file with its full path because I started the python REPL from the same folder as I have my C code.
Nice! It worked. Next step was to find a way to invoke the hello
function I had previously written from within the python REPL.
So, I did what any self-respecting programmer would do: tinker. In the end, I figured out how to execute the program from Python:
import ctypes
hello_program = ctypes.cdll.LoadLibrary("hello.so")
# Execute the hello function I defined in the c code
hello_program.hello()
Works, kinda
Extending the C program to accept input
Next experiment was simple, extend the program to accept input from Python and print out some output based on that.
I extended the C program like so:
#include <stdio.h>
void hello (char* name) {
printf("Hello %s I am DILL!\\n", name);
}
Compiled it one more time: gcc -shared -o hello_with_args.so hello_with_args.c
I attempted to access the C code from Python one more time like so:
import ctypes
hello = ctypes.cdll.LoadLibrary("hello_with_args.so")
hello.hello("Friday") # Hello F I am DILL
For some reason, it only displayed the first character of the argument. First suspicion was that it had something to do with how “strings” were defined in C: char*
luckily, reading the documentation revealed that char*
is the equivalent of a bytes object in Python. Cool, so prepending b
to the string should fix it, and it indeed did. For dynamic strings, you’d have to explicitly encode the variable var.encode('utf-8')
Updated code:
import ctypes
hello = ctypes.cdll.LoadLibrary("hello_with_args.so")
hello.hello(b"Friday") # Hello Friday I am DILL
Works, Nice!
Extending the C program to return some data
The experiment wouldn’t be successful if I didn’t return some kind of output to the Python side from C. For this I rewrote the program like so:
#include <string.h>
#include <stdlib.h>
int addNumbers(int a, int b) {
return a + b;
}
char* sayHello(char* name) {
char* hello = "Hello ";
char* result = malloc(strlen(hello) + strlen(name) + 1); // Allocate memory for the result
strcpy(result, hello); // Copy the hello string into the result
strcat(result, name);
return result;
}
Above program contains two functions addNumbers
and sayHello
that both return some data.
One thing that stood out to me, which in hindsight should have been obvious was that I could not just concatenate two strings, I had to explicitly declare the size of the resulting string before concatenating. That’s what the line with malloc
is doing: allocating enough memory for the original string “hello” and the name
variable.
Compile one more time: gcc -shared -o return_data.so return_data.c
Accessing the data from C
import ctypes
c_program = ctypes.cdll.LoadLibrary("return_data.so")
c_program.addNumbers(2,5) # 7 <int>
No Surprises here.
Next I attempted to access the sayHello
program, then I ran into a blocker:
...
response = c_program.sayHello(b"Friday")
type(response) # int
print(response) # 60096976
The response was always a seemingly random sequence of numbers which I suspected was the memory location of the string. Running on this assumption, I went down a rabbit hole of trying to find the value of data at a memory location but with no luck. Then I decided to make no assumptions and once more, read the documentation.
The solution to the problem of the function call returning integers instead of strings can be solved by declaring the response type (restype
) of the function before invoking it. Which implies that one has to know the response type of the function being invoked in C beforehand. In this case, the return type is c_char_p
which is the equivalent of char*
in C. (https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p)
The entire program would then go on to look something like this:
import ctypes
c_program = ctypes.cdll.LoadLibrary("return_data.so")
c_program.addNumbers(2,5) # 7 <int>
c_program.sayHello.restype = ctypes.c_char_p
response = c_program.sayHello(b"Friday")
name = "Some guy"
response_2 = c_program.sayHello(name.encode('utf-8')
type(response) # bytes
print(response.decode('utf-8')) # Hello Friday
print(response_2.decode('utf-8')) # Hello Some guy
Summary
In conclusion I am happy with the results of my preliminary experiments, I have been able to:
- Learn what a shared library in C is
- Learn basic usage of the
ctypes
library - Trigger a function within a C shared library
- Learn how to allocate extra memory for string concatenation in C
- Learn how to pass arguments to C programs
- Learn how to access data from the C library