Code
import math
= 12
total_elements = 4
shrink_to_elements
= math.ceil(total_elements / shrink_to_elements) group_size
Down-sampling involves reducing the size of an array while preserving its essential information.
One common method is to group elements into blocks and compute the mean of each block.
To find the group size, you divide the total number of elements by the desired number of groups:
import math
= 12
total_elements = 4
shrink_to_elements
= math.ceil(total_elements / shrink_to_elements) group_size
The reshape
function is used to rearrange the elements of the array into a new shape. In the context of down-sampling, we use reshape
to group elements into blocks. Let’s break down how this works:
(4, 3)
. This means we have 4 rows, each containing 3 elements.import numpy as np
= np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
arr = np.array([
arr_tuple 1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15), (16, 17, 18),
(19, 20, 21), (22, 23, 24), (25, 26, 27), (28, 29, 30), (31, 32, 33), (34, 35, 36)
(
])
"""
Downscaling factor
------------------
>>> downscale_factor = 12 // 4
>>> downscale_factor = len(arr) // target_size
>>> downscale_factor = arr.shape[0] // target_size
"""
= 3
downscale_factor
"""
Reshape the array to group elements
-----------------------------------
>>> reshaped_arr_tuple = arr_tuple.reshape(-1, downscale_factor, arr_tuple.shape[1])
"""
= arr.reshape(-1, downscale_factor)
reshaped_arr = arr_tuple.reshape(-1, downscale_factor, 3)
reshaped_arr_tuple
"""
Assert the shape of the reshaped array
"""
assert reshaped_arr.shape == (4, 3)
assert reshaped_arr_tuple.shape == (4, 3, 3)
"""
Downsample the array by taking the mean of each block
"""
= reshaped_arr.mean(axis=1)
downsampled_arr = reshaped_arr_tuple.mean(axis=1)
downsampled_arr_tuple
"""
Assert the shape of the downsampled array
"""
assert arr.shape == (12,)
assert downsampled_arr.shape == (4,)
assert arr_tuple.shape == (12, 3)
assert downsampled_arr_tuple.shape == (4, 3)
"""
Results
"""
assert np.array_equal(downsampled_arr, np.array([2.0, 5.0, 8.0, 11.0]))
assert np.array_equal(downsampled_arr_tuple, np.array([
4.0, 5.0, 6.0],
[13.0, 14.0, 15.0],
[22.0, 23.0, 24.0],
[31.0, 32.0, 33.0]
[ ]))
Axis Parameter (axis=1): The axis parameter in the mean function specifies along
which axis of the array the mean should be calculated. In a multidimensional array, axis=0 refers to the rows (or the first dimension), and axis=1 refers to the columns (or the second dimension). Here, axis=1 means that the mean will be calculated along the second dimension, which is specified by downscale_factor
.
Mean Calculation: When calculating the mean along axis=1, NumPy will compute the mean of each block of downscale_factor
elements. Since each block contains tuples of three values, the mean will be calculated individually for each corresponding element across the tuples in a block. This results in a new array where each element is the mean of the corresponding elements from the tuples in the original block.
Reshaping: The -1
in the reshape function tells NumPy to calculate the appropriate size for that dimension.
Averaging: The mean of each block is calculated along axis 1.
Three Parameters: The reshape function here takes three parameters to specify the new shape of the array. These parameters are used to transform the original array into a multidimensional array with the specified dimensions. - First Parameter (-1): This is an automatic dimension. When you use -1 in the reshape function, NumPy calculates the appropriate size for that dimension based on the original array’s size and the other dimensions you specify. This means that if your original array has N elements and you specify downscale_factor and 3 as the other dimensions, the -1 will be replaced with a value such that N = (-1) * downscale_factor * 3. - Second Parameter (downscale_factor): This specifies the second dimension of the reshaped array. In this context, it’s used to group the elements of the original array into blocks of size downscale_factor
. - Third Parameter (3): This specifies the third dimension of the reshaped array. Since arr_tuple
contains tuples of three elements each, this dimension ensures that each tuple remains intact as a separate entity within the reshaped array.
If you have a reshaped array like this:
[1, 2, 3), (4, 5, 6), (7, 8, 9)],
[(10, 11, 12), (13, 14, 15), (16, 17, 18)],
[(19, 20, 21), (22, 23, 24), (25, 26, 27)],
[(28, 29, 30), (31, 32, 33), (34, 35, 36)]
[( ]
[[(1, 2, 3), (4, 5, 6), (7, 8, 9)],
[(10, 11, 12), (13, 14, 15), (16, 17, 18)],
[(19, 20, 21), (22, 23, 24), (25, 26, 27)],
[(28, 29, 30), (31, 32, 33), (34, 35, 36)]]
And you calculate the mean along axis=1, you will get:
[1+4+7)/3, (2+5+8)/3, (3+6+9)/3),
((10+13+16)/3, (11+14+17)/3, (12+15+18)/3),
((19+22+25)/3, (20+23+26)/3, (21+24+27)/3),
((28+31+34)/3, (29+32+35)/3, (30+33+36)/3)
(( ]
[(4.0, 5.0, 6.0), (13.0, 14.0, 15.0), (22.0, 23.0, 24.0), (31.0, 32.0, 33.0)]
Downsampling requires for the shrink_to_elements to be a multiple of downscale_factor
, without any remainder. If shrink_to_elements
is not a multiple of downscale_factor
, we need to add the missing elements to the end of the array.
Numpy
has a function np.pad
that allows us to add elements to the end of an array. Not only we can add zeros, a constant value or the mean of the array.
In this case, we will use the np.pad(mode="mean", stat_length=...)
,
The missing elements should be added to the end of the array.
= np.array(
uneven_arr 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0]
[
)
= 4
target_size = uneven_arr.shape[0] # 14
total_elements = math.ceil(total_elements / target_size) # 4
downscale_factor
# Calculate missing_elements
= downscale_factor * target_size - total_elements
missing_elements = total_elements % downscale_factor
remaining_elements
# Pad the array to make its length divisible by the downscale factor
= np.mean(uneven_arr[-remaining_elements:]) if remaining_elements != 0 else 0
mean_value = np.pad(
padded_arr
uneven_arr, =(0, missing_elements),
pad_width='constant',
mode=mean_value
constant_values
)
# Assertions to verify correctness
assert np.array_equal(padded_arr, np.array([
1.0, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0,
13.0, 14.0, 13.5, 13.5
]))
# Reshape the array
= padded_arr.reshape(-1, downscale_factor)
reshaped_uneven_arr assert np.array_equal(reshaped_uneven_arr, np.array([
1.0, 2.0, 3.0, 4.0],
[5.0, 6.0, 7.0, 8.0],
[9.0, 10.0, 11.0, 12.0],
[13.0, 14.0, 13.5, 13.5]
[ ]))
Padding for Uneven Arrays: The code demonstrates how to handle arrays whose length is not evenly divisible by the target size.
Calculating Remaining Elements:
= total_elements % downscale_factor remaining_elements
This calculates how many elements are left over after dividing the array into groups of downscale_factor size.
Mean Value for Padding:
= np.mean(uneven_arr[-remaining_elements:]) if remaining_elements != 0 else 0 mean_value
This calculates the mean of the remaining elements to use as padding. If there are no remaining elements, it defaults to 0.
Padding the Array:
= np.pad(
padded_arr
uneven_arr, =(0, missing_elements),
pad_width='constant',
mode=mean_value
constant_values )
This pads the array with the calculated mean value to make its length divisible by the downscale factor.
Reshaping Padded Array:
= padded_arr.reshape(-1, downscale_factor) reshaped_uneven_arr
After padding, the array is reshaped into groups of downscale_factor size. Flexibility in Down-sampling: This approach allows for down-sampling to any target size, not just sizes that are factors of the original array length.
Preserving Data Characteristics: By using the mean of remaining elements for padding, the method preserves the statistical properties of the data at the end of the array.
This example demonstrates how to down-sample a NumPy array by averaging groups of elements.
def downsample(arr: np.ndarray, target_size: int) -> np.ndarray:
"""
Downsample a NumPy array by averaging groups of elements.
Example
-------
>>> arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
... target_size = 4
... downsampled_arr = downsample(arr, target_size)
... print(downsampled_arr)
<<< [ 2.5 6.5 10.5 13.5]
Parameters
----------
arr : np.ndarray
The input array to be downsampled.
target_size : int
The desired number of elements after downsampling.
Returns
-------
np.ndarray
The downsampled array.
"""
= arr.shape[0]
total_elements = math.ceil(total_elements / target_size)
downscale_factor
# Calculate missing elements
= downscale_factor * target_size - total_elements
missing_elements = total_elements % downscale_factor
remaining_elements
# Pad the array to make its length divisible by the downscale factor
if remaining_elements != 0:
= np.mean(arr[-remaining_elements:])
mean_value else:
= 0
mean_value
= np.pad(
padded_arr
arr, =(0, missing_elements),
pad_width='constant',
mode=mean_value
constant_values
)
# Reshape the array
= padded_arr.reshape(-1, downscale_factor)
reshaped_arr
# Downsample the array by taking the mean of each block
return reshaped_arr.mean(axis=1)
def downsample_tupled(arr: np.ndarray, target_size: int) -> np.ndarray:
"""
Downsample a NumPy array by averaging groups of elements.
This other implementation uses `vstack` instead of `pad`.
Example
-------
>>> print(
... downsample_tupled(
... np.array([[2, 2], [2, 4], [2, 6], [7, 8], [9, 10], [11, 12], [13, 14]]), 3
... )
... )
<<< [[ 2. 4.]
... [ 9. 10.]
... [13. 14.]]
>>> print(downsample_tupled(
... np.array(
... [
... [1, 2, 7],
... [3, 4, 8],
... [5, 6, 9],
... [7, 8, 10],
... [9, 10, 13],
... [11, 12, 14],
... [13, 14, 15],
... ]
... ),
... 3,
... ))
<<< [[ 3. 4. 8. ]
... [ 9. 10. 12.33333333]
... [13. 14. 15. ]]
Parameters
----------
arr : np.ndarray
The input array to be downsampled.
target_size : int
The desired number of elements after downsampling.
Returns
-------
np.ndarray
The downsampled array.
"""
= arr.shape[0]
total_elements = math.ceil(total_elements / target_size)
downscale_factor
# Calculate missing elements
= downscale_factor * target_size - total_elements
missing_elements = total_elements % downscale_factor
remaining_elements
# Calculate the mean of the last `remaining_elements` rows
= (
mean_value -remaining_elements:], axis=0)
np.mean(arr[if remaining_elements != 0
else np.zeros(arr.shape[1:], dtype=np.float64)
)
# Append the missing rows using `hstack`
= np.tile(mean_value, (missing_elements, 1))
padding_rows = np.vstack([arr, padding_rows])
padded_arr
# Reshape the array
= padded_arr.reshape(-1, downscale_factor, *arr.shape[1:])
reshaped_arr
# Downsample the array by taking the mean of each block
return reshaped_arr.mean(axis=1)