C
C#4d ago
jan

Coursework analysis of quicksort results in (very) unexpected behaviour

Desc: Implemented simple recursive quicksort algorithm to test it on a 'A' shaped arrays of different sizes (eg. [1, 2, 3, 4, 5, 5, 4, 3, 2, 1]). Expected behaviour: (BASED ON MY THOUGHTS) - rightmost pivot: O(n^2) - at least for 'A' shaped arrays - middle pivot: O(n^2) - at least for 'A' shaped arrays - random pivot: from O(n log n) to O(n^2) Actual results are on the diagram. Problems: - random pivot ends up being basically instant (0-2ms) even for arrays of size 72 000 - middle pivot ends up taking 2x the amount of rightmost pivot I also tested it against random shuffled arrays using Random.Shuffle, and for some reason introducing random to all pivot strategies makes all the timings >2ms, so it looks like using random makes any pivot strategy instant. I got no idea why it happens, I've also made sure that all arrays are not sorted before sorting, and sorted after sorting. Code: https://pastebin.com/3zF7SLbz .NET 8 VS2022 Please help..
Pastebin
using System;using System.IO;using System.Collections.Generic;using...
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
31 Replies
jan
janOP4d ago
QuickSort method:
enum PivotSelectionStrategy
{
Rightmost,
Middle,
Random
}

static void QuickSort(int[] array, int leftIndex, int rightIndex, PivotSelectionStrategy pivotStrategy, Random randomGenerator)
{
//metoda rekursywna - kończymy rozgałęzienie, gdy nie ma elementów do posortowania
if (leftIndex >= rightIndex)
return;

//dzielimy tablicę na dwie części
int partitionIndex = Partition(array, leftIndex, rightIndex, pivotStrategy, randomGenerator);

//rekursja dla lewej części
QuickSort(array, leftIndex, partitionIndex - 1, pivotStrategy, randomGenerator);
//i dla prawej
QuickSort(array, partitionIndex + 1, rightIndex, pivotStrategy, randomGenerator);
}
enum PivotSelectionStrategy
{
Rightmost,
Middle,
Random
}

static void QuickSort(int[] array, int leftIndex, int rightIndex, PivotSelectionStrategy pivotStrategy, Random randomGenerator)
{
//metoda rekursywna - kończymy rozgałęzienie, gdy nie ma elementów do posortowania
if (leftIndex >= rightIndex)
return;

//dzielimy tablicę na dwie części
int partitionIndex = Partition(array, leftIndex, rightIndex, pivotStrategy, randomGenerator);

//rekursja dla lewej części
QuickSort(array, leftIndex, partitionIndex - 1, pivotStrategy, randomGenerator);
//i dla prawej
QuickSort(array, partitionIndex + 1, rightIndex, pivotStrategy, randomGenerator);
}
Partition method:
static int Partition(int[] array, int leftIndex, int rightIndex, PivotSelectionStrategy pivotStrategy, Random randomGenerator)
{
int pivotIndex;

//inicjujemy index pivota z wartością zależną od wybranej strategii
switch (pivotStrategy)
{
case PivotSelectionStrategy.Rightmost:
pivotIndex = rightIndex;
break;
case PivotSelectionStrategy.Middle:
pivotIndex = leftIndex + (rightIndex - leftIndex) / 2;
break;
case PivotSelectionStrategy.Random:
pivotIndex = randomGenerator.Next(leftIndex, rightIndex + 1);
break;
default:
pivotIndex = rightIndex;
break;
}
//inicjujemy wartość pivota do porównań
int pivotValue = array[pivotIndex];
//spychamy pivot na koniec dla prostszego for-loopa
Swap(array, pivotIndex, rightIndex);


//inicjujemy index po lewej, mówi on nam do którego elementu *włącznie* leżą elementy mniejsze od pivota
int lastSmallerElementIndex = leftIndex; // '<' zamiast "<=" bo pomijamy pivot, którego umieściliśmy na końcu
for (int currentIndex = leftIndex; currentIndex < rightIndex; currentIndex++)
{
if (array[currentIndex] <= pivotValue)
{
Swap(array, currentIndex, lastSmallerElementIndex);
lastSmallerElementIndex++;
}
}

Swap(array, lastSmallerElementIndex, rightIndex);
return lastSmallerElementIndex;
}
static int Partition(int[] array, int leftIndex, int rightIndex, PivotSelectionStrategy pivotStrategy, Random randomGenerator)
{
int pivotIndex;

//inicjujemy index pivota z wartością zależną od wybranej strategii
switch (pivotStrategy)
{
case PivotSelectionStrategy.Rightmost:
pivotIndex = rightIndex;
break;
case PivotSelectionStrategy.Middle:
pivotIndex = leftIndex + (rightIndex - leftIndex) / 2;
break;
case PivotSelectionStrategy.Random:
pivotIndex = randomGenerator.Next(leftIndex, rightIndex + 1);
break;
default:
pivotIndex = rightIndex;
break;
}
//inicjujemy wartość pivota do porównań
int pivotValue = array[pivotIndex];
//spychamy pivot na koniec dla prostszego for-loopa
Swap(array, pivotIndex, rightIndex);


//inicjujemy index po lewej, mówi on nam do którego elementu *włącznie* leżą elementy mniejsze od pivota
int lastSmallerElementIndex = leftIndex; // '<' zamiast "<=" bo pomijamy pivot, którego umieściliśmy na końcu
for (int currentIndex = leftIndex; currentIndex < rightIndex; currentIndex++)
{
if (array[currentIndex] <= pivotValue)
{
Swap(array, currentIndex, lastSmallerElementIndex);
lastSmallerElementIndex++;
}
}

Swap(array, lastSmallerElementIndex, rightIndex);
return lastSmallerElementIndex;
}
Utils:
static int[] GenerateAShapedArray(int size)
{
int[] array = new int[size];
for (int i = 0; i < size / 2; i++)
{
array[i] = i;
}
for (int i = size / 2; i < size; i++)
{
array[i] = size - i;
}

return array;
}
public static bool IsSorted(int[] array)
{
for (int i = 0; i < array.Length - 1; i++)
{
if (array[i] > array[i + 1])
{
return false;
}
}
return true;
}
static int[] GenerateAShapedArray(int size)
{
int[] array = new int[size];
for (int i = 0; i < size / 2; i++)
{
array[i] = i;
}
for (int i = size / 2; i < size; i++)
{
array[i] = size - i;
}

return array;
}
public static bool IsSorted(int[] array)
{
for (int i = 0; i < array.Length - 1; i++)
{
if (array[i] > array[i + 1])
{
return false;
}
}
return true;
}
Main:
static void Main()
{
string csvOutputFilePath = Path.Combine
(
Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory),
"quick_sort_analysis.csv"
);

using (StreamWriter writer = new StreamWriter(csvOutputFilePath))
{
writer.WriteLine("ArraySize,PivotStrategy,Time(ms)");

int[] arraySizes =
{
4000, 8000, 12000, 16000, 20000, 24000, 28000, 32000, 36000,
40000, 44000, 48000, 52000, 56000, 60000, 64000, 68000, 72000
};

foreach (int arraySize in arraySizes)
{
int[] array = GenerateAShapedArray(arraySize);

foreach (PivotSelectionStrategy pivotStrategy in Enum.GetValues(typeof(PivotSelectionStrategy)))
{

int[] arrayToSort = (int[])array.Clone();


long elapsedMs = 0;
Random threadRandom = new Random();
Swap(arrayToSort, 0, 100);
Swap(arrayToSort, 0, 500);

var sortThread = new Thread(() =>
{
Stopwatch stopwatch = Stopwatch.StartNew();
QuickSort(arrayToSort, 0, arrayToSort.Length - 1, pivotStrategy, threadRandom);
stopwatch.Stop();

Interlocked.Exchange(ref elapsedMs, stopwatch.ElapsedMilliseconds);
}, maxStackSize: 32 * 1024 * 1024);
sortThread.Start();
sortThread.Join();

writer.WriteLine($"{arraySize},{pivotStrategy},{elapsedMs}");
Console.WriteLine($"Array size: {arraySize}, Pivot: {pivotStrategy}, Time: {elapsedMs} ms");
}
}
}
Console.WriteLine($"timingi zrobione, zapisane na {csvOutputFilePath}");
}
static void Main()
{
string csvOutputFilePath = Path.Combine
(
Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory),
"quick_sort_analysis.csv"
);

using (StreamWriter writer = new StreamWriter(csvOutputFilePath))
{
writer.WriteLine("ArraySize,PivotStrategy,Time(ms)");

int[] arraySizes =
{
4000, 8000, 12000, 16000, 20000, 24000, 28000, 32000, 36000,
40000, 44000, 48000, 52000, 56000, 60000, 64000, 68000, 72000
};

foreach (int arraySize in arraySizes)
{
int[] array = GenerateAShapedArray(arraySize);

foreach (PivotSelectionStrategy pivotStrategy in Enum.GetValues(typeof(PivotSelectionStrategy)))
{

int[] arrayToSort = (int[])array.Clone();


long elapsedMs = 0;
Random threadRandom = new Random();
Swap(arrayToSort, 0, 100);
Swap(arrayToSort, 0, 500);

var sortThread = new Thread(() =>
{
Stopwatch stopwatch = Stopwatch.StartNew();
QuickSort(arrayToSort, 0, arrayToSort.Length - 1, pivotStrategy, threadRandom);
stopwatch.Stop();

Interlocked.Exchange(ref elapsedMs, stopwatch.ElapsedMilliseconds);
}, maxStackSize: 32 * 1024 * 1024);
sortThread.Start();
sortThread.Join();

writer.WriteLine($"{arraySize},{pivotStrategy},{elapsedMs}");
Console.WriteLine($"Array size: {arraySize}, Pivot: {pivotStrategy}, Time: {elapsedMs} ms");
}
}
}
Console.WriteLine($"timingi zrobione, zapisane na {csvOutputFilePath}");
}
canton7
canton74d ago
For starters, use BenchmarkDotNet for benchmarking. There are all sorts of confounding factors which can influence micro-benchmarks, such as caches being warmed up, or the JIT deciding to optimise your code further because it's being called a lot
jan
janOP4d ago
I'm gonna go and try that right away
canton7
canton74d ago
I also don't see you calling IsSorted anywhere? So if you had a bug in your quicksort which resulted in it not sorting anything, you probably wouldn't notice?
jan
janOP4d ago
it was commented out because I wanted to give here the clean code and I had to trim it when pasting into discord but I did test it, it wasn't sorted before every sorting and was sorted after every sorting
canton7
canton74d ago
Your code looks fine I wonder whether the cost of the swaps is dominating? For a rightmost pivot, on the first half of the array you're only ever swapping elements with themselves, and on the second half you're not swapping at all. But with a middle pivot, for the second half of the array, you're doing a ton of swaps between non-adjacent elements (I think?). When you're dealing with large partitions, that might end up being quite cache-unfriendly The thing to do would be to BenchmarkDotNet it to see whehter the effect is real (or where it is a cold cache / JIT / etc issue), and then run it in a profiler to see where your time's actually being spent
jan
janOP4d ago
I just ran it, results are basically the same Right most ends up being ~4x faster this time here (JIT?)
canton7
canton74d ago
As in, the same as reported in your question? Or the different pivot strategies have the same cost?
jan
janOP4d ago
random is instant as in the main post wait a second shouldn't rightmost & middle pivot selection strategies be virtually identical?
canton7
canton74d ago
I don't think so?
jan
janOP4d ago
look middle pivot in first iteration gives you pivot being the largest number in the array, so you will end up with first partition full of numbers and the second one empty, because there's no elements larger than the pivot and the rightmost pivot on first iteration gives you pivot being the smaller nubmer which ends up with first partition empty and the second one full so it's just mirrored no?
canton7
canton74d ago
The first iteration is always going to be the odd one out: it's all of the ones after it which matter I think At some point I think you must end up either sorting ascending or descending sequences surely?
jan
janOP4d ago
okay but when you apply same logic to the partitions from 1st iteration, we also get 2 partitions with 1 being empty and the other one being full
canton7
canton74d ago
And spend the majority of your time on those
jan
janOP4d ago
it returns only if it's given an array of size 1, then everything is being put back altogether idk tbh im getting mad at this
canton7
canton74d ago
Hmm yeah you might be right about the middle pivot. I'd have to work it through, but the middle pivot might always be the largest element, so you end up being strictly left-recursive Time to take a break 😄
jan
janOP4d ago
exactly, mirror version of the right-recursive right? im just gonna do that in python tbh im tired of this
Core
Core4d ago
Sorry, I needed to send this
jan
janOP4d ago
I FOUND IT guys omg rightmost vs middle pivot points have similar amount of calls but not SWAPS when we choose righmost pivot, the if-statement in Partition:
if (array[currentIndex] <= pivotValue)
{
Swap(array, currentIndex, lastSmallerElementIndex);
lastSmallerElementIndex++;
}
if (array[currentIndex] <= pivotValue)
{
Swap(array, currentIndex, lastSmallerElementIndex);
lastSmallerElementIndex++;
}
is true only for 1 element (at least on a [1 2 3 4 5 4 3 2 1] array), therefore it executes 1 swap because pivotValue == 1 MEANWHILE when we choose middle element in an A shaped array, we get ~array.Length swaps I've implemented swapCount and callCount and confirmed that middle pivot does ~2-8x more swaps than the rightmost pivot :EZ:
Anton
Anton4d ago
I recommend you use spans here they make it much simpler conceptually
jan
janOP4d ago
out of curiosity, what's simpler in a span? I haven't worked with it yet, got no idea how they work, and although they might be faster/more optimal, I think array is just fine here because it exhausts the needs of this algorithm just right, without making code unreadable/complicated
canton7
canton74d ago
A span can represent a segment of an array. So instead of passing the array, left, and right as 3 separate parameters, you can just pass a single span
Anton
Anton4d ago
Array is more complex than span, being a higher level abstraction. Span is literally the simplest way to indicate a block of memory. They probably won't be much faster The code will be easier to understand though
jan
janOP3d ago
yeah I get it now, it actually looks very logically correct when you think about it would swapping elements of a span propagate to the original array though?
canton7
canton73d ago
Yes A swap is just setting two values, you just happen to be setting each to the other's value Btw, the syntax (a, b) = (b, a) works to do a swap, and it's easier than a function call
Core
Core2d ago
Won't this create a (b,a) tuple, then unpack it?
Anton
Anton2d ago
afaik it's recognized by the compiler as a swap
canton7
canton72d ago
No. The (a, b) syntax on the left-hand side of an assignment never creates a tuple Eg var (x, y) = ... is used to declare two local variables x and y. (a, b) = ... assigns to two pre-existing variables
Anton
Anton22h ago
they meant the right hand side and they are right semantically it is that
canton7
canton722h ago
Oh duh, (a, b) vs (b, a) yep. I misread.

Did you find this page helpful?