Cython Speed-up notes

While I started coding on Cython I found a number of tips and tricks of what to (not) do. This is a collection of those things...

Basic tutorials and tips

For the basis, this is a list of documentation that I found useful:

Some basics tips that will speed up your code significantly:

  • Type your variables : all variables, functions inputs, local variables, global variables, etc.
  • Minimize functions called from other python libraries (avoid overheads)
  • Try defining all local function as inline
  • Learn the difference between cdef, def, and pcdef
  • If not necessary, release the GIL and make it explicit (ie. use nogil)
  • Use function decorators (e.g. @cython.boundscheck(False))

Some examples

Typing variables

An easy way to see if you are typing (correctly) all variables, is to see the annotated version of your source code. Let us compare the following three functions.

In [3]:
%load_ext Cython
In [4]:
%%cython
#--annotate
import time
import sys

cimport cython
cimport numpy as np
import numpy as np


def untyped_func(tab, tab_len, scalar):
    res = 0.
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

def somewhat_typed_func(np.ndarray[double, ndim=1, mode="c"] tab not None, int tab_len, double scalar):
    cdef double res = 0.
    cdef int i
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef double typed_func(double[::1] tab, int tab_len, double scalar) nogil:
    cdef double res = 0.
    cdef int i
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

We can already see that the third function typed_func, has much less yellow, which generally means less C code behind it, thus faster code. Let's benchmark them.

In [5]:
%%cython
import time
import sys

cimport cython
cimport numpy as np
import numpy as np


def untyped_func(tab, tab_len, scalar):
    res = 0.
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

def somewhat_typed_func(np.ndarray[double, ndim=1, mode="c"] tab not None, int tab_len, double scalar):
    cdef double res = 0.
    cdef int i
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef double typed_func(double[::1] tab, int tab_len, double scalar) nogil:
    cdef double res = 0.
    cdef int i
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef inline double inline_typed_func(double[::1] tab, int tab_len, double scalar) nogil:
    cdef double res = 0.
    cdef int i
    for i in range(tab_len):
        res += tab[i] * scalar
    return res

cdef int L, i, loops = 1000
cdef double start, end, res
for L in [1000, 10000, 100000]:
    np_array = np.ones(L)
    print("For L = ", L)
    start = time.clock()
    res = untyped_func(np_array, L, 2.)
    end = time.clock()
    print(format((end-start) / loops * 1e6, "2f"), end=" ")
    sys.stdout.flush()
    print("μs, using the untyped_func")
    # ..................................................
    start = time.clock()
    res = somewhat_typed_func(np_array, L, 2.)
    end = time.clock()
    print(format((end-start) / loops * 1e6, "2f"), end=" ")
    sys.stdout.flush()
    print("μs, using the somewhat_typed_func")
    # ..................................................
    start = time.clock()
    res = typed_func(np_array, L, 2.)
    end = time.clock()
    print(format((end-start) / loops * 1e6, "2f"), end=" ")
    sys.stdout.flush()
    print("μs, using the typed_func")
    # ..................................................
    start = time.clock()
    res = inline_typed_func(np_array, L, 2.)
    end = time.clock()
    print(format((end-start) / loops * 1e6, "2f"), end=" ")
    sys.stdout.flush()
    print("μs, using the inline_typed_func")
For L =  1000
0.672000 μs, using the untyped_func
0.030000 μs, using the somewhat_typed_func
0.094000 μs, using the typed_func
0.016000 μs, using the inline_typed_func
For L =  10000
4.182000 μs, using the untyped_func
0.025000 μs, using the somewhat_typed_func
0.010000 μs, using the typed_func
0.019000 μs, using the inline_typed_func
For L =  100000
34.736000 μs, using the untyped_func
0.134000 μs, using the somewhat_typed_func
0.019000 μs, using the typed_func
0.019000 μs, using the inline_typed_func

Working with numpy arrays in I/O

The first challenge I was confronted to, was handling Numpy arrays. The cython part of our code takes as inputs numpy arrays, and should give as output numpy arrays as well. However, reading and writing from numpy arrays can be slow in cython. Some tutorials mentioned using memory views, other mention that C array give a clear improvement, and overall several different solutions are mentioned. A StackOverflow answer makes a good benchmark between these solutions for a code that only need to create arrays (not taking any inputs) and giving back a numpy array: https://stackoverflow.com/questions/18462785/what-is-the-recommended-way-of-allocating-memory-for-a-typed-memory-view

However, here we need to focus on the copying and accessing the data from the numpy array.

In [6]:
%%cython
import time
import sys

from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
import numpy as np
cimport numpy as np

cdef int loops

def timefunc(name):
    def timedecorator(f):
        cdef int L, i

        print("Running", name)
        for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:
            np_array = np.ones(L)
            start = time.clock()
            res_array = f(L, np_array)
            end = time.clock()
            print(format((end-start) / loops * 1e6, "2f"), end=" ")
            sys.stdout.flush()

        print("μs")
    return timedecorator

print()
print("-------- TESTS -------")
loops = 3000


@timefunc("numpy buffers")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i, j
    cdef double d
    for i in range(loops):
        for j in range(L):
            d = np_array[j]
            np_array[j] = d*0.
    # Prevents dead code elimination
    str(np_array[0])
    return np_array
    
@timefunc("cpython.array buffer")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i, j
    cdef double d
    cdef array[double] arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)
        for j in range(L):
            # initialization
            arr[j] = np_array[j]
            # access
            d = arr[j]
            arr[j] = d*2.
    # Prevents dead code elimination
    return np.asarray(arr)


@timefunc("cpython.array memoryview")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i, j
    cdef double d
    cdef double[::1] arr

    for i in range(loops):
        arr = np_array
        for j in range(L):
            # usage
            d = arr[j]
            arr[j] = d*0.
    # Prevents dead code elimination
    return np_array
    

@timefunc("cpython.array raw C type with trick")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i
    cdef array arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)
        for j in range(L):
            # initialization
            arr.data.as_doubles[j] = np_array[j]
            # usage
            d = arr.data.as_doubles[j]
            arr.data.as_doubles[j] = d*2.
    # Prevents dead code elimination
    return np.asarray(arr)


@timefunc("C pointers")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i
    cdef double* arrptr

    for i in range(loops):
        arrptr = <double*> np_array.data
        for j in range(L):
            d = arrptr[j]
            arrptr[j] = d*0.

    return np_array

@timefunc("malloc memoryview")
def _(int L, np.ndarray[double, ndim=1, mode="c"] np_array not None):
    cdef int i
    cdef double* arrptr
    cdef double[::1] arr

    for i in range(loops):
        arrptr = <double*> np_array.data
        arr = <double[:L]>arrptr
        for j in range(L):
            d = arrptr[j]
            arrptr[j] = d*0.

    return np_array

@timefunc("argument memoryview")
def _(int L, double[::1] np_array not None):
    cdef int i, j
    cdef double d

    for i in range(loops):
        for j in range(L):
            # usage
            d = np_array[j]
            np_array[j] = d*0.
    # Prevents dead code elimination
    return np_array
-------- TESTS -------
Running numpy buffers
0.008000 0.016333 0.122333 0.781333 7.176333 59.649333 618.152667 μs
Running cpython.array buffer
0.201333 0.123667 0.381000 0.813333 7.674000 61.717333 1893.197667 μs
Running cpython.array memoryview
0.817000 0.906000 1.233000 1.793000 6.210333 47.117333 533.513333 μs
Running cpython.array raw C type with trick
0.066333 0.080667 0.393000 1.578667 10.331667 98.201333 2234.453667 μs
Running C pointers
0.006000 0.005667 0.036667 0.233000 3.214333 25.526667 355.021667 μs
Running malloc memoryview
0.519333 0.826667 0.838667 0.745667 3.227667 23.325000 378.858333 μs
Running argument memoryview
0.011333 0.008333 0.056667 0.434333 5.270667 44.856667 529.291333 μs

In conclusion: For all cases, you will gain a 2x factor speed up by using a C pointer. Since the memory is already allocated for the numpy array, it is not necessary to use malloc. We will adopt the following declaration:

cdef double* arrptr
arrptr = <double*> np_array.data

Note that for all functions we declared the numpy array in the function header.

Parallelization and arrays

After optimizing the code, the obvious step to speed-up the code is to parallelize. From the documentation it seems that this should be quite easy, but I discovered a few things to keep in mind. Let's start with a simple loop example

In [7]:
import Cython.Compiler.Options as CO
CO.extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ]
CO.extra_link_args = ['-fopenmp']
In [8]:
%%cython --compile=-fopenmp --link-args=-fopenmp

cimport cython

from cython.parallel cimport parallel, prange
from cython.parallel cimport threadid
from libc.stdio cimport stdout, fprintf
import time
import sys

from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free


@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef inline void seq_func(int L, double* arrptr):
    cdef int j

    for j in range(L):
        arrptr[j] = 2.0*arrptr[j]
    return

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef inline void bad_par_func(int L, double* arrptr):
    cdef Py_ssize_t j
    cdef double d

    with nogil, parallel():
        arrptr[0] = 0
        for j in prange(1, L-1):
            # or any other operation that doesn't allow to the code parallelized
            arrptr[j+1] = 2.0*arrptr[j]-arrptr[j-1]
        arrptr[L-1] = 0
    return

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
cdef inline void good_par_func(int L, double* arrptr) nogil:
    cdef Py_ssize_t j

    for j in prange(L, nogil=True):
        arrptr[j] = 2.0*arrptr[j]
    return


cdef int L, i, loops = 1000, ilps
cdef double start, end, res
cdef double t0=0.0, t1=0.0, t2=0.0
cdef double* tab
for L in [1000, 10000, 100000]:
    tab = <double *> malloc(sizeof(double) * L)
    print("For L = ", L)
    for ilps in range(loops):
        # ..................................................
        start = time.clock()
        seq_func(L, tab)
        end = time.clock()
        t0 += (end - start) / loops
        # ..................................................
        start = time.clock()
        bad_par_func(L, tab)
        end = time.clock()
        t1 += (end - start) / loops
        # ..................................................
        start = time.clock()
        good_par_func(L, tab)
        end = time.clock()
        t2 += (end - start) / loops

    print(format(t0 * 1e6, "2f"), "μs, using the sequential loop")
    print(format(t1 * 1e6, "2f"), "μs, using the parallel 1 loop")
    print(format(t2 * 1e6, "2f"), "μs, using the parallel 2 loop")
    
    free(tab)
For L =  1000
24.909000 μs, using the sequential loop
71.584000 μs, using the parallel 1 loop
51.277000 μs, using the parallel 2 loop
For L =  10000
83.992000 μs, using the sequential loop
641.786000 μs, using the parallel 1 loop
220.366000 μs, using the parallel 2 loop
For L =  100000
345.196000 μs, using the sequential loop
4159.920000 μs, using the parallel 1 loop
969.578000 μs, using the parallel 2 loop

Other errors to avoid is to add variables incrementation on the parallel part, e.g. i += 1

In [10]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp -a

cimport cython

from cython.parallel import parallel, prange
from libc.stdlib cimport abort, malloc, free
import time, sys
import numpy as np
cimport numpy as np


cdef int loops

def timefunc(name):
    def timedecorator(f):
        cdef int L, i
        cdef np.ndarray np_array
        cdef np.ndarray[double] global_buf

        print("Running", name)
        for L in [10000, 1000000]:
            np_array = np.ones(L)
            global_buf = np_array
            start = time.clock()
            f(global_buf, L, <int>(L/2))
            end = time.clock()
            print(format((end-start) / loops * 1e6, "2f"), end=" ")
            sys.stdout.flush()

        print("μs")
    return timedecorator

print()
print("-------- TESTS -------")
loops = 1000

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@timefunc("Static allocation for n=2")
def _(double[::1] global_buf not None, int n, int n2):
    cdef double[2] local_buf
    cdef int idx, i

    with nogil, parallel():
        for i in range(loops):
            for idx in prange(n2, schedule='guided'):
                local_buf[0] = global_buf[idx*2]
                local_buf[1] = global_buf[idx*2+1]
                func(local_buf)
    return

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@timefunc("Dynamic allocation for n=2")
def _(double[::1] global_buf not None, int n, int n2):
    cdef double* local_buf
    cdef int idx, i

    with nogil, parallel():
        for i in range(loops):
            local_buf = <double *> malloc(sizeof(double) * 2)
            for idx in prange(n2, schedule='guided'):
                local_buf[0] = global_buf[idx*2]
                local_buf[1] = global_buf[idx*2+1]
                func(local_buf)
            free(local_buf)

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@timefunc("Static allocation for n=4")
def _(double[::1] global_buf not None, int n, int n2):
    cdef double[4] local_buf
    cdef int idx, i, n4 = <int> (n2/2)

    with nogil, parallel():
        for i in range(loops):
            for idx in prange(n4, schedule='guided'):
                local_buf[0] = global_buf[idx*4]
                local_buf[1] = global_buf[idx*4+1]
                local_buf[2] = global_buf[idx*4+2]
                local_buf[3] = global_buf[idx*4+3]
                func(local_buf)
    return

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
@timefunc("Dynamic allocation for n=4")
def _(double[::1] global_buf not None, int n, int n2):
    cdef double* local_buf
    cdef int idx, i, n4 = <int> (n2/2)

    with nogil, parallel():
        for i in range(loops):
            local_buf = <double *> malloc(sizeof(double) * 4)
            for idx in prange(n4, schedule='guided'):
                local_buf[0] = global_buf[idx*4]
                local_buf[1] = global_buf[idx*4+1]
                local_buf[2] = global_buf[idx*4+2]
                local_buf[3] = global_buf[idx*4+3]
                func(local_buf)
            free(local_buf)
        
# ==============================================================================
# test function
cdef void func(double* local_buf) nogil:
    cdef int i=0
    return
-------- TESTS -------
Running Static allocation for n=2
35.271000 4394.811000 μs
Running Dynamic allocation for n=2
27.964000 611.417000 μs
Running Static allocation for n=4
185.051000 35408.322000 μs
Running Dynamic allocation for n=4
9.119000 313.265000 μs
Out[10]:
Cython: _cython_magic_3581c8be701b141ba2e4a4555cbc9e45.pyx

Generated by Cython 0.29.7

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

 001: 
+002: cimport cython
  __pyx_t_3 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_3) < 0) __PYX_ERR(0, 2, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
 003: 
 004: from cython.parallel import parallel, prange
 005: from libc.stdlib cimport abort, malloc, free
+006: import time, sys
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_time, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_time, __pyx_t_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_sys, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_sys, __pyx_t_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+007: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 008: cimport numpy as np
 009: 
 010: 
 011: cdef int loops
 012: 
+013: def timefunc(name):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_1timefunc(PyObject *__pyx_self, PyObject *__pyx_v_name); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_1timefunc = {"timefunc", (PyCFunction)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_1timefunc, METH_O, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_1timefunc(PyObject *__pyx_self, PyObject *__pyx_v_name) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("timefunc (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_timefunc(__pyx_self, ((PyObject *)__pyx_v_name));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_timefunc(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_name) {
  struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *__pyx_cur_scope;
  PyObject *__pyx_v_timedecorator = 0;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("timefunc", 0);
  __pyx_cur_scope = (struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *)__pyx_tp_new_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc(__pyx_ptype_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc, __pyx_empty_tuple, NULL);
  if (unlikely(!__pyx_cur_scope)) {
    __pyx_cur_scope = ((struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *)Py_None);
    __Pyx_INCREF(Py_None);
    __PYX_ERR(0, 13, __pyx_L1_error)
  } else {
    __Pyx_GOTREF(__pyx_cur_scope);
  }
  __pyx_cur_scope->__pyx_v_name = __pyx_v_name;
  __Pyx_INCREF(__pyx_cur_scope->__pyx_v_name);
  __Pyx_GIVEREF(__pyx_cur_scope->__pyx_v_name);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45.timefunc", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_timedecorator);
  __Pyx_DECREF(((PyObject *)__pyx_cur_scope));
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__31 = PyTuple_Pack(3, __pyx_n_s_name, __pyx_n_s_timedecorator, __pyx_n_s_timedecorator); if (unlikely(!__pyx_tuple__31)) __PYX_ERR(0, 13, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__31);
  __Pyx_GIVEREF(__pyx_tuple__31);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_1timefunc, NULL, __pyx_n_s_cython_magic_3581c8be701b141ba2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_timefunc, __pyx_t_1) < 0) __PYX_ERR(0, 13, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_codeobj__32 = (PyObject*)__Pyx_PyCode_New(1, 0, 3, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__31, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s_timefunc, 13, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__32)) __PYX_ERR(0, 13, __pyx_L1_error)
/* … */
struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc {
  PyObject_HEAD
  PyObject *__pyx_v_name;
};

+014:     def timedecorator(f):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_1timedecorator(PyObject *__pyx_self, PyObject *__pyx_v_f); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_1timedecorator = {"timedecorator", (PyCFunction)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_1timedecorator, METH_O, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_1timedecorator(PyObject *__pyx_self, PyObject *__pyx_v_f) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("timedecorator (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_timedecorator(__pyx_self, ((PyObject *)__pyx_v_f));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_timedecorator(PyObject *__pyx_self, PyObject *__pyx_v_f) {
  struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *__pyx_cur_scope;
  struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *__pyx_outer_scope;
  int __pyx_v_L;
  PyArrayObject *__pyx_v_np_array = 0;
  PyArrayObject *__pyx_v_global_buf = 0;
  PyObject *__pyx_v_start = NULL;
  PyObject *__pyx_v_end = NULL;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_global_buf;
  __Pyx_Buffer __pyx_pybuffer_global_buf;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("timedecorator", 0);
  __pyx_outer_scope = (struct __pyx_obj_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45___pyx_scope_struct__timefunc *) __Pyx_CyFunction_GetClosure(__pyx_self);
  __pyx_cur_scope = __pyx_outer_scope;
  __pyx_pybuffer_global_buf.pybuffer.buf = NULL;
  __pyx_pybuffer_global_buf.refcount = 0;
  __pyx_pybuffernd_global_buf.data = NULL;
  __pyx_pybuffernd_global_buf.rcbuffer = &__pyx_pybuffer_global_buf;
/* … */
  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_11);
  __Pyx_XDECREF(__pyx_t_12);
  { PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_global_buf.rcbuffer->pybuffer);
  __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45.timefunc.timedecorator", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  goto __pyx_L2;
  __pyx_L0:;
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_global_buf.rcbuffer->pybuffer);
  __pyx_L2:;
  __Pyx_XDECREF((PyObject *)__pyx_v_np_array);
  __Pyx_XDECREF((PyObject *)__pyx_v_global_buf);
  __Pyx_XDECREF(__pyx_v_start);
  __Pyx_XDECREF(__pyx_v_end);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__4 = PyTuple_Pack(7, __pyx_n_s_f, __pyx_n_s_L, __pyx_n_s_i, __pyx_n_s_np_array, __pyx_n_s_global_buf, __pyx_n_s_start, __pyx_n_s_end); if (unlikely(!__pyx_tuple__4)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__4);
  __Pyx_GIVEREF(__pyx_tuple__4);
/* … */
  __pyx_t_1 = __Pyx_CyFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8timefunc_1timedecorator, 0, __pyx_n_s_timefunc_locals_timedecorator, ((PyObject*)__pyx_cur_scope), __pyx_n_s_cython_magic_3581c8be701b141ba2, __pyx_d, ((PyObject *)__pyx_codeobj__5)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_v_timedecorator = __pyx_t_1;
  __pyx_t_1 = 0;
  __pyx_codeobj__5 = (PyObject*)__Pyx_PyCode_New(1, 0, 7, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__4, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s_timedecorator, 14, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__5)) __PYX_ERR(0, 14, __pyx_L1_error)
 015:         cdef int L, i
 016:         cdef np.ndarray np_array
 017:         cdef np.ndarray[double] global_buf
 018: 
+019:         print("Running", name)
  if (unlikely(!__pyx_cur_scope->__pyx_v_name)) { __Pyx_RaiseClosureNameError("name"); __PYX_ERR(0, 19, __pyx_L1_error) }
  __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_INCREF(__pyx_n_u_Running);
  __Pyx_GIVEREF(__pyx_n_u_Running);
  PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_n_u_Running);
  __Pyx_INCREF(__pyx_cur_scope->__pyx_v_name);
  __Pyx_GIVEREF(__pyx_cur_scope->__pyx_v_name);
  PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_cur_scope->__pyx_v_name);
  __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_t_1, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+020:         for L in [10000, 1000000]:
  __pyx_t_2 = __pyx_tuple_; __Pyx_INCREF(__pyx_t_2); __pyx_t_3 = 0;
  for (;;) {
    if (__pyx_t_3 >= 2) break;
    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_1); __pyx_t_3++; if (unlikely(0 < 0)) __PYX_ERR(0, 20, __pyx_L1_error)
    #else
    __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 20, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    #endif
    __pyx_t_4 = __Pyx_PyInt_As_int(__pyx_t_1); if (unlikely((__pyx_t_4 == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 20, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __pyx_v_L = __pyx_t_4;
/* … */
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
/* … */
  __pyx_tuple_ = PyTuple_Pack(2, __pyx_int_10000, __pyx_int_1000000); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 20, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple_);
  __Pyx_GIVEREF(__pyx_tuple_);
+021:             np_array = np.ones(L)
    __Pyx_GetModuleGlobalName(__pyx_t_5, __pyx_n_s_np); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 21, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_6 = __Pyx_PyObject_GetAttrStr(__pyx_t_5, __pyx_n_s_ones); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 21, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_L); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 21, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_7 = NULL;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_6))) {
      __pyx_t_7 = PyMethod_GET_SELF(__pyx_t_6);
      if (likely(__pyx_t_7)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_6);
        __Pyx_INCREF(__pyx_t_7);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_6, function);
      }
    }
    __pyx_t_1 = (__pyx_t_7) ? __Pyx_PyObject_Call2Args(__pyx_t_6, __pyx_t_7, __pyx_t_5) : __Pyx_PyObject_CallOneArg(__pyx_t_6, __pyx_t_5);
    __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 21, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (!(likely(((__pyx_t_1) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_1, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 21, __pyx_L1_error)
    __Pyx_XDECREF_SET(__pyx_v_np_array, ((PyArrayObject *)__pyx_t_1));
    __pyx_t_1 = 0;
+022:             global_buf = np_array
    {
      __Pyx_BufFmt_StackElem __pyx_stack[1];
      __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_global_buf.rcbuffer->pybuffer);
      __pyx_t_4 = __Pyx_GetBufferAndValidate(&__pyx_pybuffernd_global_buf.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_v_np_array), &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack);
      if (unlikely(__pyx_t_4 < 0)) {
        PyErr_Fetch(&__pyx_t_8, &__pyx_t_9, &__pyx_t_10);
        if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_global_buf.rcbuffer->pybuffer, (PyObject*)__pyx_v_global_buf, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {
          Py_XDECREF(__pyx_t_8); Py_XDECREF(__pyx_t_9); Py_XDECREF(__pyx_t_10);
          __Pyx_RaiseBufferFallbackError();
        } else {
          PyErr_Restore(__pyx_t_8, __pyx_t_9, __pyx_t_10);
        }
        __pyx_t_8 = __pyx_t_9 = __pyx_t_10 = 0;
      }
      __pyx_pybuffernd_global_buf.diminfo[0].strides = __pyx_pybuffernd_global_buf.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_global_buf.diminfo[0].shape = __pyx_pybuffernd_global_buf.rcbuffer->pybuffer.shape[0];
      if (unlikely(__pyx_t_4 < 0)) __PYX_ERR(0, 22, __pyx_L1_error)
    }
    __Pyx_INCREF(((PyObject *)__pyx_v_np_array));
    __Pyx_XDECREF_SET(__pyx_v_global_buf, ((PyArrayObject *)__pyx_v_np_array));
+023:             start = time.clock()
    __Pyx_GetModuleGlobalName(__pyx_t_6, __pyx_n_s_time); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 23, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_6, __pyx_n_s_clock); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 23, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    __pyx_t_6 = NULL;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_5))) {
      __pyx_t_6 = PyMethod_GET_SELF(__pyx_t_5);
      if (likely(__pyx_t_6)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
        __Pyx_INCREF(__pyx_t_6);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_5, function);
      }
    }
    __pyx_t_1 = (__pyx_t_6) ? __Pyx_PyObject_CallOneArg(__pyx_t_5, __pyx_t_6) : __Pyx_PyObject_CallNoArg(__pyx_t_5);
    __Pyx_XDECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_XDECREF_SET(__pyx_v_start, __pyx_t_1);
    __pyx_t_1 = 0;
+024:             f(global_buf, L, <int>(L/2))
    __pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_L); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 24, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_6 = __Pyx_PyInt_From_int(((int)(((double)__pyx_v_L) / 2.0))); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 24, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __Pyx_INCREF(__pyx_v_f);
    __pyx_t_7 = __pyx_v_f; __pyx_t_11 = NULL;
    __pyx_t_4 = 0;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_7))) {
      __pyx_t_11 = PyMethod_GET_SELF(__pyx_t_7);
      if (likely(__pyx_t_11)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_7);
        __Pyx_INCREF(__pyx_t_11);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_7, function);
        __pyx_t_4 = 1;
      }
    }
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_7)) {
      PyObject *__pyx_temp[4] = {__pyx_t_11, ((PyObject *)__pyx_v_global_buf), __pyx_t_5, __pyx_t_6};
      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_7, __pyx_temp+1-__pyx_t_4, 3+__pyx_t_4); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 24, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_11); __pyx_t_11 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_7)) {
      PyObject *__pyx_temp[4] = {__pyx_t_11, ((PyObject *)__pyx_v_global_buf), __pyx_t_5, __pyx_t_6};
      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_7, __pyx_temp+1-__pyx_t_4, 3+__pyx_t_4); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 24, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_11); __pyx_t_11 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    } else
    #endif
    {
      __pyx_t_12 = PyTuple_New(3+__pyx_t_4); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 24, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_12);
      if (__pyx_t_11) {
        __Pyx_GIVEREF(__pyx_t_11); PyTuple_SET_ITEM(__pyx_t_12, 0, __pyx_t_11); __pyx_t_11 = NULL;
      }
      __Pyx_INCREF(((PyObject *)__pyx_v_global_buf));
      __Pyx_GIVEREF(((PyObject *)__pyx_v_global_buf));
      PyTuple_SET_ITEM(__pyx_t_12, 0+__pyx_t_4, ((PyObject *)__pyx_v_global_buf));
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_12, 1+__pyx_t_4, __pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_6);
      PyTuple_SET_ITEM(__pyx_t_12, 2+__pyx_t_4, __pyx_t_6);
      __pyx_t_5 = 0;
      __pyx_t_6 = 0;
      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_7, __pyx_t_12, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 24, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    }
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+025:             end = time.clock()
    __Pyx_GetModuleGlobalName(__pyx_t_7, __pyx_n_s_time); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 25, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __pyx_t_12 = __Pyx_PyObject_GetAttrStr(__pyx_t_7, __pyx_n_s_clock); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 25, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __pyx_t_7 = NULL;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_12))) {
      __pyx_t_7 = PyMethod_GET_SELF(__pyx_t_12);
      if (likely(__pyx_t_7)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_12);
        __Pyx_INCREF(__pyx_t_7);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_12, function);
      }
    }
    __pyx_t_1 = (__pyx_t_7) ? __Pyx_PyObject_CallOneArg(__pyx_t_12, __pyx_t_7) : __Pyx_PyObject_CallNoArg(__pyx_t_12);
    __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
    if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 25, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    __Pyx_XDECREF_SET(__pyx_v_end, __pyx_t_1);
    __pyx_t_1 = 0;
+026:             print(format((end-start) / loops * 1e6, "2f"), end=" ")
    __pyx_t_1 = PyNumber_Subtract(__pyx_v_end, __pyx_v_start); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __pyx_t_12 = __Pyx_PyInt_From_int(__pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __pyx_t_7 = __Pyx_PyNumber_Divide(__pyx_t_1, __pyx_t_12); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    __pyx_t_12 = PyNumber_Multiply(__pyx_t_7, __pyx_float_1e6); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __pyx_t_7 = PyTuple_New(2); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_GIVEREF(__pyx_t_12);
    PyTuple_SET_ITEM(__pyx_t_7, 0, __pyx_t_12);
    __Pyx_INCREF(__pyx_kp_u_2f);
    __Pyx_GIVEREF(__pyx_kp_u_2f);
    PyTuple_SET_ITEM(__pyx_t_7, 1, __pyx_kp_u_2f);
    __pyx_t_12 = 0;
    __pyx_t_12 = __Pyx_PyObject_Call(__pyx_builtin_format, __pyx_t_7, NULL); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __pyx_t_7 = PyTuple_New(1); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_GIVEREF(__pyx_t_12);
    PyTuple_SET_ITEM(__pyx_t_7, 0, __pyx_t_12);
    __pyx_t_12 = 0;
    __pyx_t_12 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    if (PyDict_SetItem(__pyx_t_12, __pyx_n_s_end, __pyx_kp_u__2) < 0) __PYX_ERR(0, 26, __pyx_L1_error)
    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_t_7, __pyx_t_12); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 26, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+027:             sys.stdout.flush()
    __Pyx_GetModuleGlobalName(__pyx_t_12, __pyx_n_s_sys); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 27, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __pyx_t_7 = __Pyx_PyObject_GetAttrStr(__pyx_t_12, __pyx_n_s_stdout); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 27, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    __pyx_t_12 = __Pyx_PyObject_GetAttrStr(__pyx_t_7, __pyx_n_s_flush); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 27, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_12);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __pyx_t_7 = NULL;
    if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_12))) {
      __pyx_t_7 = PyMethod_GET_SELF(__pyx_t_12);
      if (likely(__pyx_t_7)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_12);
        __Pyx_INCREF(__pyx_t_7);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_12, function);
      }
    }
    __pyx_t_1 = (__pyx_t_7) ? __Pyx_PyObject_CallOneArg(__pyx_t_12, __pyx_t_7) : __Pyx_PyObject_CallNoArg(__pyx_t_12);
    __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
    if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 27, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 028: 
+029:         print("μs")
  __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple__3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 29, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
/* … */
  __pyx_tuple__3 = PyTuple_Pack(1, __pyx_n_u_s); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 29, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);
+030:     return timedecorator
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_timedecorator);
  __pyx_r = __pyx_v_timedecorator;
  goto __pyx_L0;
 031: 
+032: print()
  __pyx_t_1 = __Pyx_PyObject_CallNoArg(__pyx_builtin_print); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 32, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+033: print("-------- TESTS -------")
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple__33, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 33, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_tuple__33 = PyTuple_Pack(1, __pyx_kp_u_TESTS); if (unlikely(!__pyx_tuple__33)) __PYX_ERR(0, 33, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__33);
  __Pyx_GIVEREF(__pyx_tuple__33);
+034: loops = 1000
  __pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops = 0x3E8;
 035: 
 036: @cython.boundscheck(False)  # Deactivate bounds checking
 037: @cython.wraparound(False)   # Deactivate negative indexing.
+038: @timefunc("Static allocation for n=2")
  __Pyx_GetModuleGlobalName(__pyx_t_1, __pyx_n_s_timefunc); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 38, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_tuple__34, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 38, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_tuple__34 = PyTuple_Pack(1, __pyx_kp_u_Static_allocation_for_n_2); if (unlikely(!__pyx_tuple__34)) __PYX_ERR(0, 38, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__34);
  __Pyx_GIVEREF(__pyx_tuple__34);
/* … */
  __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_t_1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 38, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_d, __pyx_n_s__37, __pyx_t_3) < 0) __PYX_ERR(0, 39, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
+039: def _(double[::1] global_buf not None, int n, int n2):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_3_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_3_ = {"_", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_3_, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_3_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_global_buf = { 0, 0, { 0 }, { 0 }, { 0 } };
  CYTHON_UNUSED int __pyx_v_n;
  CYTHON_UNUSED int __pyx_v_n2;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_global_buf,&__pyx_n_s_n,&__pyx_n_s_n2,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_global_buf)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 1); __PYX_ERR(0, 39, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n2)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 2); __PYX_ERR(0, 39, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "_") < 0)) __PYX_ERR(0, 39, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
    }
    __pyx_v_global_buf = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_global_buf.memview)) __PYX_ERR(0, 39, __pyx_L3_error)
    __pyx_v_n = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 39, __pyx_L3_error)
    __pyx_v_n2 = __Pyx_PyInt_As_int(values[2]); if (unlikely((__pyx_v_n2 == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 39, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 39, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45._", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  if (unlikely(((PyObject *)__pyx_v_global_buf.memview) == Py_None)) {
    PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "global_buf"); __PYX_ERR(0, 39, __pyx_L1_error)
  }
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_2_(__pyx_self, __pyx_v_global_buf, __pyx_v_n, __pyx_v_n2);

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_2_(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_global_buf, CYTHON_UNUSED int __pyx_v_n, CYTHON_UNUSED int __pyx_v_n2) {
  double __pyx_v_local_buf[2];
  int __pyx_v_idx;
  CYTHON_UNUSED int __pyx_v_i;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_", 0);
/* … */
  /* function exit code */
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_global_buf, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__35 = PyTuple_Pack(6, __pyx_n_s_global_buf, __pyx_n_s_n, __pyx_n_s_n2, __pyx_n_s_local_buf, __pyx_n_s_idx, __pyx_n_s_i); if (unlikely(!__pyx_tuple__35)) __PYX_ERR(0, 39, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__35);
  __Pyx_GIVEREF(__pyx_tuple__35);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_3_, NULL, __pyx_n_s_cython_magic_3581c8be701b141ba2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 39, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_codeobj__36 = (PyObject*)__Pyx_PyCode_New(3, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__35, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s__37, 39, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__36)) __PYX_ERR(0, 39, __pyx_L1_error)
 040:     cdef double[2] local_buf
 041:     cdef int idx, i
 042: 
+043:     with nogil, parallel():
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      __Pyx_FastGIL_Remember();
      #endif
      /*try:*/ {
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            #ifdef _OPENMP
            #pragma omp parallel private(__pyx_v_i)
            #endif /* _OPENMP */
            {
                /* Initialize private variables to invalid values */
                __pyx_v_i = ((int)0xbad0bad0);
/* … */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          __Pyx_FastGIL_Forget();
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }
+044:         for i in range(loops):
                __pyx_t_1 = __pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops;
                __pyx_t_2 = __pyx_t_1;
                for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                  __pyx_v_i = __pyx_t_3;
+045:             for idx in prange(n2, schedule='guided'):
                  __pyx_t_4 = __pyx_v_n2;
                  if (1 == 0) abort();
                  {
                      __pyx_t_6 = (__pyx_t_4 - 0 + 1 - 1/abs(1)) / 1;
                      if (__pyx_t_6 > 0)
                      {
                          #ifdef _OPENMP
                          #pragma omp for firstprivate(__pyx_v_idx) lastprivate(__pyx_v_idx) schedule(guided)
                          #endif /* _OPENMP */
                          for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_6; __pyx_t_5++){
                              {
                                  __pyx_v_idx = (int)(0 + 1 * __pyx_t_5);
+046:                 local_buf[0] = global_buf[idx*2]
                                  __pyx_t_7 = (__pyx_v_idx * 2);
                                  (__pyx_v_local_buf[0]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_7)) )));
+047:                 local_buf[1] = global_buf[idx*2+1]
                                  __pyx_t_8 = ((__pyx_v_idx * 2) + 1);
                                  (__pyx_v_local_buf[1]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_8)) )));
+048:                 func(local_buf)
                                  __pyx_f_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_func(__pyx_v_local_buf);
                              }
                          }
                      }
                  }
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }
+049:     return
  __Pyx_XDECREF(__pyx_r);
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
 050: 
 051: @cython.boundscheck(False)  # Deactivate bounds checking
 052: @cython.wraparound(False)   # Deactivate negative indexing.
+053: @timefunc("Dynamic allocation for n=2")
  __Pyx_GetModuleGlobalName(__pyx_t_3, __pyx_n_s_timefunc); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 53, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_tuple__38, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 53, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
/* … */
  __pyx_tuple__38 = PyTuple_Pack(1, __pyx_kp_u_Dynamic_allocation_for_n_2); if (unlikely(!__pyx_tuple__38)) __PYX_ERR(0, 53, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__38);
  __Pyx_GIVEREF(__pyx_tuple__38);
/* … */
  __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 53, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  if (PyDict_SetItem(__pyx_d, __pyx_n_s__37, __pyx_t_2) < 0) __PYX_ERR(0, 54, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+054: def _(double[::1] global_buf not None, int n, int n2):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_5_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_5_ = {"_", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_5_, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_5_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_global_buf = { 0, 0, { 0 }, { 0 }, { 0 } };
  CYTHON_UNUSED int __pyx_v_n;
  CYTHON_UNUSED int __pyx_v_n2;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_global_buf,&__pyx_n_s_n,&__pyx_n_s_n2,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_global_buf)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 1); __PYX_ERR(0, 54, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n2)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 2); __PYX_ERR(0, 54, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "_") < 0)) __PYX_ERR(0, 54, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
    }
    __pyx_v_global_buf = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_global_buf.memview)) __PYX_ERR(0, 54, __pyx_L3_error)
    __pyx_v_n = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 54, __pyx_L3_error)
    __pyx_v_n2 = __Pyx_PyInt_As_int(values[2]); if (unlikely((__pyx_v_n2 == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 54, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 54, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45._", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  if (unlikely(((PyObject *)__pyx_v_global_buf.memview) == Py_None)) {
    PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "global_buf"); __PYX_ERR(0, 54, __pyx_L1_error)
  }
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_4_(__pyx_self, __pyx_v_global_buf, __pyx_v_n, __pyx_v_n2);

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_4_(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_global_buf, CYTHON_UNUSED int __pyx_v_n, CYTHON_UNUSED int __pyx_v_n2) {
  double *__pyx_v_local_buf;
  int __pyx_v_idx;
  CYTHON_UNUSED int __pyx_v_i;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_", 0);
/* … */
  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  __PYX_XDEC_MEMVIEW(&__pyx_v_global_buf, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__39 = PyTuple_Pack(6, __pyx_n_s_global_buf, __pyx_n_s_n, __pyx_n_s_n2, __pyx_n_s_local_buf, __pyx_n_s_idx, __pyx_n_s_i); if (unlikely(!__pyx_tuple__39)) __PYX_ERR(0, 54, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__39);
  __Pyx_GIVEREF(__pyx_tuple__39);
/* … */
  __pyx_t_3 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_5_, NULL, __pyx_n_s_cython_magic_3581c8be701b141ba2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 54, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_codeobj__40 = (PyObject*)__Pyx_PyCode_New(3, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__39, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s__37, 54, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__40)) __PYX_ERR(0, 54, __pyx_L1_error)
 055:     cdef double* local_buf
 056:     cdef int idx, i
 057: 
+058:     with nogil, parallel():
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      __Pyx_FastGIL_Remember();
      #endif
      /*try:*/ {
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            #ifdef _OPENMP
            #pragma omp parallel private(__pyx_v_i, __pyx_v_local_buf)
            #endif /* _OPENMP */
            {
                /* Initialize private variables to invalid values */
                __pyx_v_i = ((int)0xbad0bad0);
                __pyx_v_local_buf = ((double *)1);
/* … */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          __Pyx_FastGIL_Forget();
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }
+059:         for i in range(loops):
                __pyx_t_1 = __pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops;
                __pyx_t_2 = __pyx_t_1;
                for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                  __pyx_v_i = __pyx_t_3;
+060:             local_buf = <double *> malloc(sizeof(double) * 2)
                  __pyx_v_local_buf = ((double *)malloc(((sizeof(double)) * 2)));
+061:             for idx in prange(n2, schedule='guided'):
                  __pyx_t_4 = __pyx_v_n2;
                  if (1 == 0) abort();
                  {
                      __pyx_t_6 = (__pyx_t_4 - 0 + 1 - 1/abs(1)) / 1;
                      if (__pyx_t_6 > 0)
                      {
                          #ifdef _OPENMP
                          #pragma omp for firstprivate(__pyx_v_idx) lastprivate(__pyx_v_idx) schedule(guided)
                          #endif /* _OPENMP */
                          for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_6; __pyx_t_5++){
                              {
                                  __pyx_v_idx = (int)(0 + 1 * __pyx_t_5);
+062:                 local_buf[0] = global_buf[idx*2]
                                  __pyx_t_7 = (__pyx_v_idx * 2);
                                  (__pyx_v_local_buf[0]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_7)) )));
+063:                 local_buf[1] = global_buf[idx*2+1]
                                  __pyx_t_8 = ((__pyx_v_idx * 2) + 1);
                                  (__pyx_v_local_buf[1]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_8)) )));
+064:                 func(local_buf)
                                  __pyx_f_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_func(__pyx_v_local_buf);
                              }
                          }
                      }
                  }
+065:             free(local_buf)
                  free(__pyx_v_local_buf);
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }
 066: 
 067: @cython.boundscheck(False)  # Deactivate bounds checking
 068: @cython.wraparound(False)   # Deactivate negative indexing.
+069: @timefunc("Static allocation for n=4")
  __Pyx_GetModuleGlobalName(__pyx_t_2, __pyx_n_s_timefunc); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 69, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_tuple__41, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 69, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
/* … */
  __pyx_tuple__41 = PyTuple_Pack(1, __pyx_kp_u_Static_allocation_for_n_4); if (unlikely(!__pyx_tuple__41)) __PYX_ERR(0, 69, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__41);
  __Pyx_GIVEREF(__pyx_tuple__41);
/* … */
  __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 69, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  if (PyDict_SetItem(__pyx_d, __pyx_n_s__37, __pyx_t_1) < 0) __PYX_ERR(0, 70, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+070: def _(double[::1] global_buf not None, int n, int n2):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_7_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_7_ = {"_", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_7_, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_7_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_global_buf = { 0, 0, { 0 }, { 0 }, { 0 } };
  CYTHON_UNUSED int __pyx_v_n;
  int __pyx_v_n2;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_global_buf,&__pyx_n_s_n,&__pyx_n_s_n2,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_global_buf)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 1); __PYX_ERR(0, 70, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n2)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 2); __PYX_ERR(0, 70, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "_") < 0)) __PYX_ERR(0, 70, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
    }
    __pyx_v_global_buf = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_global_buf.memview)) __PYX_ERR(0, 70, __pyx_L3_error)
    __pyx_v_n = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 70, __pyx_L3_error)
    __pyx_v_n2 = __Pyx_PyInt_As_int(values[2]); if (unlikely((__pyx_v_n2 == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 70, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 70, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45._", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  if (unlikely(((PyObject *)__pyx_v_global_buf.memview) == Py_None)) {
    PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "global_buf"); __PYX_ERR(0, 70, __pyx_L1_error)
  }
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_6_(__pyx_self, __pyx_v_global_buf, __pyx_v_n, __pyx_v_n2);

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_6_(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_global_buf, CYTHON_UNUSED int __pyx_v_n, int __pyx_v_n2) {
  double __pyx_v_local_buf[4];
  int __pyx_v_idx;
  CYTHON_UNUSED int __pyx_v_i;
  CYTHON_UNUSED int __pyx_v_n4;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_", 0);
/* … */
  /* function exit code */
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_global_buf, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__42 = PyTuple_Pack(7, __pyx_n_s_global_buf, __pyx_n_s_n, __pyx_n_s_n2, __pyx_n_s_local_buf, __pyx_n_s_idx, __pyx_n_s_i, __pyx_n_s_n4); if (unlikely(!__pyx_tuple__42)) __PYX_ERR(0, 70, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__42);
  __Pyx_GIVEREF(__pyx_tuple__42);
/* … */
  __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_7_, NULL, __pyx_n_s_cython_magic_3581c8be701b141ba2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 70, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_codeobj__43 = (PyObject*)__Pyx_PyCode_New(3, 0, 7, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__42, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s__37, 70, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__43)) __PYX_ERR(0, 70, __pyx_L1_error)
 071:     cdef double[4] local_buf
+072:     cdef int idx, i, n4 = <int> (n2/2)
  __pyx_v_n4 = ((int)(((double)__pyx_v_n2) / 2.0));
 073: 
+074:     with nogil, parallel():
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      __Pyx_FastGIL_Remember();
      #endif
      /*try:*/ {
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            #ifdef _OPENMP
            #pragma omp parallel private(__pyx_v_i)
            #endif /* _OPENMP */
            {
                /* Initialize private variables to invalid values */
                __pyx_v_i = ((int)0xbad0bad0);
/* … */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          __Pyx_FastGIL_Forget();
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }
+075:         for i in range(loops):
                __pyx_t_1 = __pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops;
                __pyx_t_2 = __pyx_t_1;
                for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                  __pyx_v_i = __pyx_t_3;
+076:             for idx in prange(n4, schedule='guided'):
                  __pyx_t_4 = __pyx_v_n4;
                  if (1 == 0) abort();
                  {
                      __pyx_t_6 = (__pyx_t_4 - 0 + 1 - 1/abs(1)) / 1;
                      if (__pyx_t_6 > 0)
                      {
                          #ifdef _OPENMP
                          #pragma omp for firstprivate(__pyx_v_idx) lastprivate(__pyx_v_idx) schedule(guided)
                          #endif /* _OPENMP */
                          for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_6; __pyx_t_5++){
                              {
                                  __pyx_v_idx = (int)(0 + 1 * __pyx_t_5);
+077:                 local_buf[0] = global_buf[idx*4]
                                  __pyx_t_7 = (__pyx_v_idx * 4);
                                  (__pyx_v_local_buf[0]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_7)) )));
+078:                 local_buf[1] = global_buf[idx*4+1]
                                  __pyx_t_8 = ((__pyx_v_idx * 4) + 1);
                                  (__pyx_v_local_buf[1]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_8)) )));
+079:                 local_buf[2] = global_buf[idx*4+2]
                                  __pyx_t_9 = ((__pyx_v_idx * 4) + 2);
                                  (__pyx_v_local_buf[2]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_9)) )));
+080:                 local_buf[3] = global_buf[idx*4+3]
                                  __pyx_t_10 = ((__pyx_v_idx * 4) + 3);
                                  (__pyx_v_local_buf[3]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_10)) )));
+081:                 func(local_buf)
                                  __pyx_f_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_func(__pyx_v_local_buf);
                              }
                          }
                      }
                  }
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }
+082:     return
  __Pyx_XDECREF(__pyx_r);
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
 083: 
 084: @cython.boundscheck(False)  # Deactivate bounds checking
 085: @cython.wraparound(False)   # Deactivate negative indexing.
+086: @timefunc("Dynamic allocation for n=4")
  __Pyx_GetModuleGlobalName(__pyx_t_1, __pyx_n_s_timefunc); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 86, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_tuple__44, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 86, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_tuple__44 = PyTuple_Pack(1, __pyx_kp_u_Dynamic_allocation_for_n_4); if (unlikely(!__pyx_tuple__44)) __PYX_ERR(0, 86, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__44);
  __Pyx_GIVEREF(__pyx_tuple__44);
/* … */
  __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_t_1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 86, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_d, __pyx_n_s__37, __pyx_t_3) < 0) __PYX_ERR(0, 87, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
+087: def _(double[::1] global_buf not None, int n, int n2):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_9_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_9_ = {"_", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_9_, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_9_(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_global_buf = { 0, 0, { 0 }, { 0 }, { 0 } };
  CYTHON_UNUSED int __pyx_v_n;
  int __pyx_v_n2;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_global_buf,&__pyx_n_s_n,&__pyx_n_s_n2,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_global_buf)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 1); __PYX_ERR(0, 87, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_n2)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, 2); __PYX_ERR(0, 87, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "_") < 0)) __PYX_ERR(0, 87, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
    }
    __pyx_v_global_buf = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_global_buf.memview)) __PYX_ERR(0, 87, __pyx_L3_error)
    __pyx_v_n = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 87, __pyx_L3_error)
    __pyx_v_n2 = __Pyx_PyInt_As_int(values[2]); if (unlikely((__pyx_v_n2 == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 87, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("_", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 87, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_3581c8be701b141ba2e4a4555cbc9e45._", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  if (unlikely(((PyObject *)__pyx_v_global_buf.memview) == Py_None)) {
    PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "global_buf"); __PYX_ERR(0, 87, __pyx_L1_error)
  }
  __pyx_r = __pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8_(__pyx_self, __pyx_v_global_buf, __pyx_v_n, __pyx_v_n2);

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_8_(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_global_buf, CYTHON_UNUSED int __pyx_v_n, int __pyx_v_n2) {
  double *__pyx_v_local_buf;
  int __pyx_v_idx;
  CYTHON_UNUSED int __pyx_v_i;
  CYTHON_UNUSED int __pyx_v_n4;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("_", 0);
/* … */
  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  __PYX_XDEC_MEMVIEW(&__pyx_v_global_buf, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__45 = PyTuple_Pack(7, __pyx_n_s_global_buf, __pyx_n_s_n, __pyx_n_s_n2, __pyx_n_s_local_buf, __pyx_n_s_idx, __pyx_n_s_i, __pyx_n_s_n4); if (unlikely(!__pyx_tuple__45)) __PYX_ERR(0, 87, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__45);
  __Pyx_GIVEREF(__pyx_tuple__45);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_9_, NULL, __pyx_n_s_cython_magic_3581c8be701b141ba2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 87, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_codeobj__46 = (PyObject*)__Pyx_PyCode_New(3, 0, 7, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__45, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_mendoza_ipython_cython__c, __pyx_n_s__37, 87, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__46)) __PYX_ERR(0, 87, __pyx_L1_error)
 088:     cdef double* local_buf
+089:     cdef int idx, i, n4 = <int> (n2/2)
  __pyx_v_n4 = ((int)(((double)__pyx_v_n2) / 2.0));
 090: 
+091:     with nogil, parallel():
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      __Pyx_FastGIL_Remember();
      #endif
      /*try:*/ {
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            #ifdef _OPENMP
            #pragma omp parallel private(__pyx_v_i, __pyx_v_local_buf)
            #endif /* _OPENMP */
            {
                /* Initialize private variables to invalid values */
                __pyx_v_i = ((int)0xbad0bad0);
                __pyx_v_local_buf = ((double *)1);
/* … */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          __Pyx_FastGIL_Forget();
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }
+092:         for i in range(loops):
                __pyx_t_1 = __pyx_v_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_loops;
                __pyx_t_2 = __pyx_t_1;
                for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                  __pyx_v_i = __pyx_t_3;
+093:             local_buf = <double *> malloc(sizeof(double) * 4)
                  __pyx_v_local_buf = ((double *)malloc(((sizeof(double)) * 4)));
+094:             for idx in prange(n4, schedule='guided'):
                  __pyx_t_4 = __pyx_v_n4;
                  if (1 == 0) abort();
                  {
                      __pyx_t_6 = (__pyx_t_4 - 0 + 1 - 1/abs(1)) / 1;
                      if (__pyx_t_6 > 0)
                      {
                          #ifdef _OPENMP
                          #pragma omp for firstprivate(__pyx_v_idx) lastprivate(__pyx_v_idx) schedule(guided)
                          #endif /* _OPENMP */
                          for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_6; __pyx_t_5++){
                              {
                                  __pyx_v_idx = (int)(0 + 1 * __pyx_t_5);
+095:                 local_buf[0] = global_buf[idx*4]
                                  __pyx_t_7 = (__pyx_v_idx * 4);
                                  (__pyx_v_local_buf[0]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_7)) )));
+096:                 local_buf[1] = global_buf[idx*4+1]
                                  __pyx_t_8 = ((__pyx_v_idx * 4) + 1);
                                  (__pyx_v_local_buf[1]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_8)) )));
+097:                 local_buf[2] = global_buf[idx*4+2]
                                  __pyx_t_9 = ((__pyx_v_idx * 4) + 2);
                                  (__pyx_v_local_buf[2]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_9)) )));
+098:                 local_buf[3] = global_buf[idx*4+3]
                                  __pyx_t_10 = ((__pyx_v_idx * 4) + 3);
                                  (__pyx_v_local_buf[3]) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_global_buf.data) + __pyx_t_10)) )));
+099:                 func(local_buf)
                                  __pyx_f_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_func(__pyx_v_local_buf);
                              }
                          }
                      }
                  }
+100:             free(local_buf)
                  free(__pyx_v_local_buf);
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }
 101: 
 102: # ==============================================================================
 103: # test function
+104: cdef void func(double* local_buf) nogil:
static void __pyx_f_46_cython_magic_3581c8be701b141ba2e4a4555cbc9e45_func(CYTHON_UNUSED double *__pyx_v_local_buf) {
  CYTHON_UNUSED int __pyx_v_i;
/* … */
  /* function exit code */
  __pyx_L0:;
}
+105:     cdef int i=0
  __pyx_v_i = 0;
+106:     return
  goto __pyx_L0;

It might seem counter-intuitive but using a dynamic malloc (and free-ing accordingly) instead of declaring an array statically, will improve the performance of your code.