C
C#β€’9mo ago
jordinho

❔ Checking if a string only contains specified characters (Order doesn't matter)

I am trying to validate user input. My current approach is to iterate over every char of the input string and check wether the char has been specified. Is there any more efficient way?
class Navigation
{
public string Input { get; }

private List<char> _allowedCharacters { get; }

public Navigation()
{
AllowedCharacters = new List<char>()
{
'*', '/', '+', '-',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
}

public void UserInput()
{
bool invalidInput = true;

while (invalidInput)
{
Console.Write(">>> ");
Input = Console.ReadLine();

for (int i = 0; i < Input.Length; i++)
{
if (_allowedCharacters .Contains(Input[i]))
{
if (i == Input.Length - 1)
{
invalidInput = false;
break;
}
}
else
{
Console.WriteLine("Invalid\n");
invalidInput = true;
break;
}
}
}
}
}
class Navigation
{
public string Input { get; }

private List<char> _allowedCharacters { get; }

public Navigation()
{
AllowedCharacters = new List<char>()
{
'*', '/', '+', '-',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
}

public void UserInput()
{
bool invalidInput = true;

while (invalidInput)
{
Console.Write(">>> ");
Input = Console.ReadLine();

for (int i = 0; i < Input.Length; i++)
{
if (_allowedCharacters .Contains(Input[i]))
{
if (i == Input.Length - 1)
{
invalidInput = false;
break;
}
}
else
{
Console.WriteLine("Invalid\n");
invalidInput = true;
break;
}
}
}
}
}
26 Replies
WEIRD FLEX
WEIRD FLEXβ€’9mo ago
there could be a shorter way if you use linq, but more efficient... i mean i don't know if you really want to squeeze out performance in this code, if it's really that important
ZacharyPatten
ZacharyPattenβ€’9mo ago
1. write unit tests for code like this 2. don't optimize code until you have diagnosed a bottleneck using proper benchmarking and profiling 3. use BenchmarkDotNet for your benchmarks optimizing user input validation like this is typically a worthless task. you care more about your validation being accurate, not being fast in this case... unit tests > benchmarks
David_F
David_Fβ€’9mo ago
@jordinho .IndexOfAny Method on string
ZacharyPatten
ZacharyPattenβ€’9mo ago
why...? looping is just fine here and in practice most apps tend to use regex anyways
David_F
David_Fβ€’9mo ago
Because IndexOfAny will be the most performant option and it will shorten the code to one line
ZacharyPatten
ZacharyPattenβ€’9mo ago
no one will ever care about performance in this code ever it is meaningless to optimize this you care more about the code being clean, easy to read, and logically sound
David_F
David_Fβ€’9mo ago
did you read the part where I said it will shorten the code to one line?
ZacharyPatten
ZacharyPattenβ€’9mo ago
there are lots of ways to reduce this down to one line
David_F
David_Fβ€’9mo ago
ha-ha you're trolling, right? πŸ™‚
ZacharyPatten
ZacharyPattenβ€’9mo ago
no
HashSet<char> validChars = new() { '*', '/', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
Console.Write("Provide Input: ");
string? userInput = Console.ReadLine();
while (userInput is null || userInput.Any(c => !validChars.Contains(c)))
{
Console.WriteLine("Invalid Input. :(");
Console.Write("Provide Input: ");
userInput = Console.ReadLine();
}
Console.WriteLine("Valid Input. :)");
HashSet<char> validChars = new() { '*', '/', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
Console.Write("Provide Input: ");
string? userInput = Console.ReadLine();
while (userInput is null || userInput.Any(c => !validChars.Contains(c)))
{
Console.WriteLine("Invalid Input. :(");
Console.Write("Provide Input: ");
userInput = Console.ReadLine();
}
Console.WriteLine("Valid Input. :)");
a HashSet will likely outperform IndexOfAny depending on the number of allowed characters because it is using hashing rather than nested looping through the valid chars
using System.Text.RegularExpressions;
Regex regex = new(@"^[0-9*/+-]+$", RegexOptions.Compiled);
Console.Write("Provide Input: ");
string? userInput = Console.ReadLine();
while (userInput is null || !regex.IsMatch(userInput))
{
Console.WriteLine("Invalid Input. :(");
Console.Write("Provide Input: ");
userInput = Console.ReadLine();
}
Console.WriteLine("Valid Input. :)");
using System.Text.RegularExpressions;
Regex regex = new(@"^[0-9*/+-]+$", RegexOptions.Compiled);
Console.Write("Provide Input: ");
string? userInput = Console.ReadLine();
while (userInput is null || !regex.IsMatch(userInput))
{
Console.WriteLine("Invalid Input. :(");
Console.Write("Provide Input: ");
userInput = Console.ReadLine();
}
Console.WriteLine("Valid Input. :)");
However, in practice, it is common for people to use Regex for these types of things regex allows you to have the pattern as a configurable property that can be modified without source code changes if needed, but also regex is a common pattern language with lots of documentation and industry knowledge/standard
David_F
David_Fβ€’9mo ago
Use this pattern in Regex source gen and look at the generated code, probably it will use IndexOfAny You can becnmark IndexOfAny against your HashSet solution and see what happens. IndexOfAny applies vectorization so I doubt anything will outperform it
ZacharyPatten
ZacharyPattenβ€’9mo ago
@David_F here you go. add your recommendation: https://gist.github.com/ZacharyPatten/5477ad5287ae287b8b9e88daf7464952
David_F
David_Fβ€’9mo ago
@ZP β–‘β–’β–“β–ˆβ”œβ– ΜΆΛΎΜΆΝžβ– β”€β–ˆβ–“β–’β–‘ was it hard to write down input.IndexOfAny(array) >= 0 ? https://learn.microsoft.com/en-us/dotnet/api/system.string.indexofany?view=net-7.0
String.IndexOfAny Method (System)
Reports the index of the first occurrence in this instance of any character in a specified array of Unicode characters. The method returns -1 if the characters in the array are not found in this instance.
David_F
David_Fβ€’9mo ago
or better omit the return value of the call like you did with others
ZacharyPatten
ZacharyPattenβ€’9mo ago
that isn't the correct logic, that will get the first index of the first valid character. it will not get the index of any invalid characters
David_F
David_Fβ€’9mo ago
In your Regex() and HashSet() you don't find indexes either, right?
ZacharyPatten
ZacharyPattenβ€’9mo ago
the point of this code is to validate the entire string (user input). getting the indexes isn't necessary beyond the goal of validating the string.
In your Regex() and HashSet() you don't find indexes either, right?
I'm not sure exactly what you are asking
WEIRD FLEX
WEIRD FLEXβ€’9mo ago
getting the indexes isn't necessary
yeah but in the future having the error pointing at or marking a precise place in the string wouldn't be awful to me
David_F
David_Fβ€’9mo ago
Then use input.AsSpan().IndexOfAnyExcept() method @ZP β–‘β–’β–“β–ˆβ”œβ– ΜΆΛΎΜΆΝžβ– β”€β–ˆβ–“β–’β–‘ I understand that we don't need to continue the validation further if we find a character that is not valid, right?
ZacharyPatten
ZacharyPattenβ€’9mo ago
IndexOfAnyExcept
yes that would likely be correct πŸ˜‰
David_F
David_Fβ€’9mo ago
I ran your benchmark @ZP β–‘β–’β–“β–ˆβ”œβ– ΜΆΛΎΜΆΝžβ– β”€β–ˆβ–“β–’β–‘ , Regex is the best and HashSet with LINQ is the worst.
David_F
David_Fβ€’9mo ago
Regex generates this code
// Match a character in the set [*+-/-9] atomically at least once.
{
int iteration = 0;
while ((uint)iteration < (uint)slice.Length && ((long)((0xD7FF000000000000UL << (int)(charMinusLow = (uint)slice[iteration] - '*')) & (charMinusLow - 64)) < 0))
{
iteration++;
}

if (iteration == 0)
{
return false; // The input didn't match.
}

slice = slice.Slice(iteration);
pos += iteration;
}
// Match a character in the set [*+-/-9] atomically at least once.
{
int iteration = 0;
while ((uint)iteration < (uint)slice.Length && ((long)((0xD7FF000000000000UL << (int)(charMinusLow = (uint)slice[iteration] - '*')) & (charMinusLow - 64)) < 0))
{
iteration++;
}

if (iteration == 0)
{
return false; // The input didn't match.
}

slice = slice.Slice(iteration);
pos += iteration;
}
ZacharyPatten
ZacharyPattenβ€’9mo ago
yeah I updated the gist too not that is matters because all of theme have relatively similar performance and will not be a bottleneck but o well that was fun keep in mind I didn't increase the # valid characters though. Hashset efficiency will improve as that goes up in comparison to the other methods
David_F
David_Fβ€’9mo ago
I don't understand how HashSet with Linq alllocated so much...the lamda is static....
ZacharyPatten
ZacharyPattenβ€’9mo ago
always avoid delegates if you want performance struct generic parameters > delegates
Accord
Accordβ€’9mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.