Coursework analysis of quicksort results in (very) unexpected behaviour
Desc:
Implemented simple recursive quicksort algorithm to test it on a 'A' shaped arrays of different sizes (eg. [1, 2, 3, 4, 5, 5, 4, 3, 2, 1]).
Expected behaviour: (BASED ON MY THOUGHTS)
- rightmost pivot: O(n^2) - at least for 'A' shaped arrays
- middle pivot: O(n^2) - at least for 'A' shaped arrays
- random pivot: from O(n log n) to O(n^2)
Actual results are on the diagram.
Problems:
- random pivot ends up being basically instant (0-2ms) even for arrays of size 72 000
- middle pivot ends up taking 2x the amount of rightmost pivot
I also tested it against random shuffled arrays using Random.Shuffle, and for some reason introducing random to all pivot strategies makes all the timings >2ms, so it looks like using random makes any pivot strategy instant. I got no idea why it happens, I've also made sure that all arrays are not sorted before sorting, and sorted after sorting.
Code: https://pastebin.com/3zF7SLbz
.NET 8
VS2022
Please help..
Pastebin
using System;using System.IO;using System.Collections.Generic;using...
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
31 Replies
QuickSort method:
Partition method:
Utils:
Main:
For starters, use BenchmarkDotNet for benchmarking. There are all sorts of confounding factors which can influence micro-benchmarks, such as caches being warmed up, or the JIT deciding to optimise your code further because it's being called a lot
I'm gonna go and try that right away
I also don't see you calling IsSorted anywhere? So if you had a bug in your quicksort which resulted in it not sorting anything, you probably wouldn't notice?
it was commented out because I wanted to give here the clean code
and I had to trim it when pasting into discord
but I did test it, it wasn't sorted before every sorting and was sorted after every sorting
Your code looks fine
I wonder whether the cost of the swaps is dominating? For a rightmost pivot, on the first half of the array you're only ever swapping elements with themselves, and on the second half you're not swapping at all. But with a middle pivot, for the second half of the array, you're doing a ton of swaps between non-adjacent elements (I think?). When you're dealing with large partitions, that might end up being quite cache-unfriendly
The thing to do would be to BenchmarkDotNet it to see whehter the effect is real (or where it is a cold cache / JIT / etc issue), and then run it in a profiler to see where your time's actually being spent
I just ran it, results are basically the same
Right most ends up being ~4x faster this time here (JIT?)
As in, the same as reported in your question? Or the different pivot strategies have the same cost?
random is instant
as in the main post
wait a second
shouldn't rightmost & middle pivot selection strategies be virtually identical?
I don't think so?
look
middle pivot in first iteration gives you pivot being the largest number in the array, so you will end up with first partition full of numbers and the second one empty, because there's no elements larger than the pivot
and the rightmost pivot on first iteration gives you pivot being the smaller nubmer which ends up with first partition empty and the second one full
so it's just mirrored no?
The first iteration is always going to be the odd one out: it's all of the ones after it which matter I think
At some point I think you must end up either sorting ascending or descending sequences surely?
okay but when you apply same logic to the partitions from 1st iteration, we also get 2 partitions with 1 being empty and the other one being full
And spend the majority of your time on those
it returns only if it's given an array of size 1, then everything is being put back altogether
idk tbh im getting mad at this
Hmm yeah you might be right about the middle pivot. I'd have to work it through, but the middle pivot might always be the largest element, so you end up being strictly left-recursive
Time to take a break 😄
exactly, mirror version of the right-recursive right?
im just gonna do that in python tbh im tired of this
Sorry, I needed to send this
I FOUND IT
guys omg
rightmost vs middle
pivot points have similar amount of calls but not SWAPS
when we choose righmost pivot
, the if-statement in Partition:
is true only for 1 element (at least on a [1 2 3 4 5 4 3 2 1]
array), therefore it executes 1 swap
because pivotValue == 1
MEANWHILE when we choose middle element in an A shaped array, we get ~array.Length
swaps
I've implemented swapCount
and callCount
and confirmed that middle pivot does ~2-8x more swaps than the rightmost pivot
:EZ:I recommend you use spans here
they make it much simpler conceptually
out of curiosity, what's simpler in a span? I haven't worked with it yet, got no idea how they work, and although they might be faster/more optimal, I think array is just fine here because it exhausts the needs of this algorithm just right, without making code unreadable/complicated
A span can represent a segment of an array. So instead of passing the array, left, and right as 3 separate parameters, you can just pass a single span
Array is more complex than span, being a higher level abstraction. Span is literally the simplest way to indicate a block of memory.
They probably won't be much faster
The code will be easier to understand though
yeah I get it now, it actually looks very logically correct when you think about it
would swapping elements of a span propagate to the original array though?
Yes
A swap is just setting two values, you just happen to be setting each to the other's value
Btw, the syntax
(a, b) = (b, a)
works to do a swap, and it's easier than a function callWon't this create a (b,a) tuple, then unpack it?
afaik it's recognized by the compiler as a swap
No. The
(a, b)
syntax on the left-hand side of an assignment never creates a tuple
Eg var (x, y) = ...
is used to declare two local variables x
and y
. (a, b) = ...
assigns to two pre-existing variablesthey meant the right hand side
and they are right
semantically it is that
Oh duh,
(a, b)
vs (b, a)
yep. I misread.