character/word/sentence counter help

need hints
111 Replies
ἔρως
ἔρως2w ago
https://discord.com/channels/436251713830125568/1355871422639112322 that's a bloody good hint, as the british would say
bfmv
bfmvOP2w ago
i forgot to paste the code
let completeStr
let strSentenceCount

inputBox.addEventListener("keyup", (event) => {
let strWordCount = 0
completeStr = inputBox.value
let strCharacterCount = completeStr.length

for (key of completeStr) {
if (key === " "){
strWordCount++
}
}
console.log(strWordCount)

})
let completeStr
let strSentenceCount

inputBox.addEventListener("keyup", (event) => {
let strWordCount = 0
completeStr = inputBox.value
let strCharacterCount = completeStr.length

for (key of completeStr) {
if (key === " "){
strWordCount++
}
}
console.log(strWordCount)

})
ἔρως
ἔρως2w ago
it's not just the code what are you trying to do? which problems are you finding? any errors? all that is part of the question
bfmv
bfmvOP2w ago
for (key of completeStr) {
if (key === " "){
strWordCount++
}
}
for (key of completeStr) {
if (key === " "){
strWordCount++
}
}
here my intention is to increment the strWordCount variable everytime " " key is pressed
ἔρως
ἔρως2w ago
that's not how word counting works
bfmv
bfmvOP2w ago
yes im thinking to add && in the if condition
ἔρως
ἔρως2w ago
that still isn't how a word works first, you have to decide on what is a word
bfmv
bfmvOP2w ago
what to do, rookie logic😔
ἔρως
ἔρως2w ago
what is a word?
bfmv
bfmvOP2w ago
yeah any letter
ἔρως
ἔρως2w ago
hello <-- so, 5 words?
bfmv
bfmvOP2w ago
no
ἔρως
ἔρως2w ago
so, it isn't "any letter"
bfmv
bfmvOP2w ago
once the user press " ", its one word
ἔρως
ἔρως2w ago
then h i is how many words?
bfmv
bfmvOP2w ago
yeah thats the problem im thinking to solve with the "&&" in if condition
ἔρως
ἔρως2w ago
you're counting spaces, not words
bfmv
bfmvOP2w ago
right what do i do
ἔρως
ἔρως2w ago
don't count spaces count words but first, what is a word?
bfmv
bfmvOP2w ago
how do i count words
ἔρως
ἔρως2w ago
you define what is a word
bfmv
bfmvOP2w ago
yeah idk they didnt mention it
ἔρως
ἔρως2w ago
there's the naïve "anything between spaces is a word", which you can do with value.split(' ').length but that's also not how words work for example, - would be a word and no.nope would be 1 word too
bfmv
bfmvOP2w ago
oh yeah
ἔρως
ἔρως2w ago
now, you can decide to split by non-"word" characters, but 123456 would be 6 words also, what about 👍 ? that's an emoji - and it can have 1-2+ characters
bfmv
bfmvOP2w ago
i think this whole project is goofy
ἔρως
ἔρως2w ago
what about z̴͐͘a̷̋̽ľ̸̀g̸̍̀o̷͛͛? is that trillions of words? just 1 with trillions of ligatures and accents and weirdness? you need a definition on what is a word
bfmv
bfmvOP2w ago
wow they didnt give it what do i do
ἔρως
ἔρως2w ago
well, then you decide what's "good enough" or just use someone else's code 🤣
bfmv
bfmvOP2w ago
should a beginner do this project
ἔρως
ἔρως2w ago
yes
bfmv
bfmvOP2w ago
there seems much better projects
ἔρως
ἔρως2w ago
but the problem is badly defined, if it doesn't say what counts as a word
bfmv
bfmvOP2w ago
right i also have to count sentences and they didnt mention its definition either 😔
ἔρως
ἔρως2w ago
but first, you need to know what is a word, because a sentence is a set of 1 or more words
bfmv
bfmvOP2w ago
yeah how do i decide
ἔρως
ἔρως2w ago
depends on what you consider a word
bfmv
bfmvOP2w ago
how is - a word or +
ἔρως
ἔρως2w ago
is abso-fucking-lutely a word? or 3 words?
bfmv
bfmvOP2w ago
no idea
ἔρως
ἔρως2w ago
then you have to decide
bfmv
bfmvOP2w ago
if i decide its 1 word then it will be problematic
ἔρως
ἔρως2w ago
natural language processing is the subject of many books and basically it is what you need to do but that's hard, so, you compromise and make something "good enough"
Jochem
Jochem2w ago
this is way too phylosophical a question for a word counter just split the value of the textarea by " ", take the length, and call it a day
ἔρως
ἔρως2w ago
i know, but you need to split the text into "tokens" that is absolutely valid, but he also needs to count sentences which is very easy it's just split by a blob or any of . ? ! together, like no!? or the end of the text
bfmv
bfmvOP2w ago
i think i should count " - " as a word too or ? ! etc
ἔρως
ἔρως2w ago
i wouldn't count those as words i would split by anything that isn't a letter or number 1 or more of those
bfmv
bfmvOP2w ago
oh yeah
ἔρως
ἔρως2w ago
for example he, when (yesterday), cooked the ), has to be split without being counted as 3 words or even as 1 but remember: you're aiming for "ok enough"
bfmv
bfmvOP2w ago
yeah
ἔρως
ἔρως2w ago
if you're happy with just splitting by space and counting everything that isn't empty, that's fine too as long as you define what is and isn't a word
bfmv
bfmvOP2w ago
yeah but it doesnt feel satisfying
ἔρως
ἔρως2w ago
that's because there's more nuance to it but it's enough, most of the time
bfmv
bfmvOP2w ago
if i do that then " he , when ( yesterday ) , cooked " will be 8 words wtf 😔
ἔρως
ἔρως2w ago
you're right
bfmv
bfmvOP2w ago
why is this project under junior category
ἔρως
ἔρως2w ago
you can always ignore the symbols
bfmv
bfmvOP2w ago
oh yeah
Jochem
Jochem2w ago
there's another good one: Count word boundaries with a regex and divide by 2 (because each word has two word boundaries, with a little bit of extra logic for an empty text box showing 0 instead of 1:
wordCount = text.length?Math.ceil(text.split(/\b/).length / 2):0;
wordCount = text.length?Math.ceil(text.split(/\b/).length / 2):0;
ἔρως
ἔρως2w ago
that's a pretty good way to do it
bfmv
bfmvOP2w ago
😵‍💫
Jochem
Jochem2w ago
because you're supposed to half-ass the counting. They effectively want you to count spaces, period/question mark/exclamation mark and call it a day
bfmv
bfmvOP2w ago
i wanna ignore symbols and stuff
ἔρως
ἔρως2w ago
they just want to get you out of the box, think for a little and implement something ok enough
Jochem
Jochem2w ago
what they're expecting of you is this:
characterCount = text.length;
wordCount = text.split(' ').length;
sentenceCount = text.replace('?', '.').replace('!', '.').split('.').length
characterCount = text.length;
wordCount = text.split(' ').length;
sentenceCount = text.replace('?', '.').replace('!', '.').split('.').length
ἔρως
ἔρως2w ago
the \b already does that
Jochem
Jochem2w ago
or maybe sentenceCount = text.split(/\.|\?|\!/).length
bfmv
bfmvOP2w ago
ohhh
Jochem
Jochem2w ago
maybe
bfmv
bfmvOP2w ago
is it this simple
ἔρως
ἔρως2w ago
i would do /[.?!]+/
bfmv
bfmvOP2w ago
because i dont remember other projects in junior section being so hard
Jochem
Jochem2w ago
the quick and dirty version is, which the "junior" category would imply
bfmv
bfmvOP2w ago
right
Jochem
Jochem2w ago
like, most of the time, word and sentence counting doens't have to be perfect to be usable
ἔρως
ἔρως2w ago
just ok enough
Jochem
Jochem2w ago
if I'm supposed to write a 3000 word essay, no one is going to care if it's 3004 or 2997
ἔρως
ἔρως2w ago
if you want perfect, you're diving into the deep deep end
bfmv
bfmvOP2w ago
right
Jochem
Jochem2w ago
and there's a million edge cases
ἔρως
ἔρως2w ago
exactly
Jochem
Jochem2w ago
cause language is a fuck
ἔρως
ἔρως2w ago
and you need a dictionary as well
bfmv
bfmvOP2w ago
ok if the project was asking us to do make it perfect, under what difficulty level would this project be in
ἔρως
ἔρως2w ago
impossible
bfmv
bfmvOP2w ago
oh my god thank god
ἔρως
ἔρως2w ago
and im not even joking
bfmv
bfmvOP2w ago
i thought i was slow brain cuz of this project
Jochem
Jochem2w ago
using a regex with \b to find word boundaries is the cleverer solution, because you're offloading figuring out what a word is to whoever wrote the regex engine, but even then \b just looks at the edge between word characters and non-word characters. Word characters are A-Z, a-z, 0-9, and so this: ` ` is three words according to regular expressions
bfmv
bfmvOP2w ago
oh
Jochem
Jochem2w ago
some writing systems put spaces in long numbers too, so 1 200 122 747 is four words
ἔρως
ἔρως2w ago
also, it miserably fails at anything that isn't english
bfmv
bfmvOP2w ago
the tags of this project should be " html css js english "
Jochem
Jochem2w ago
phone numbers in the US:
+1 (555) 555-5555
1 2 3 4
+1 (555) 555-5555
1 2 3 4
four words
ἔρως
ἔρως2w ago
but it is bloody good for what it is acção would be 4 words too (it means "action" in portuguese) and if you use the "combination" characters, then it is a lot more the ç and the ã would be 3 words each but for the size, it's awesome
bfmv
bfmvOP2w ago
sentenceCount = text.replace('?', '.').replace('!', '.').split('.').length
sentenceCount = text.replace('?', '.').replace('!', '.').split('.').length
what would this do
ἔρως
ἔρως2w ago
replaces the first instance of ? and ! for . to then split by .
bfmv
bfmvOP2w ago
oh im switching to other project bruh😭
ἔρως
ἔρως2w ago
it's buggy because .replace with a string only replaces once splitting by /(?:\s*[.?!]\s*)+/ is good enough that gives you sentences you can use /(?:\s*(?:[.?!\W_])\s*)+/ to get all the words in the entire text
bfmv
bfmvOP2w ago
then the whole project is done isnt it
ἔρως
ἔρως2w ago
basically
bfmv
bfmvOP2w ago
hahaha
ἔρως
ἔρως2w ago
but, again, this is using regular expressions the """"""""""expert's"""""""""" way
ἔρως
ἔρως2w ago
regex101
regex101: build, test, and debug regex
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.
ἔρως
ἔρως2w ago
it's good enough try it
bfmv
bfmvOP2w ago
alright
ἔρως
ἔρως2w ago
it's matching by what is NOT a word what's left is what is a word i can optimize it a fair bit more [.?!\W_\s]+ if you want what is a word, you can do [^.?!\W_\s]+ this is for words
Jochem
Jochem2w ago
Really? I'd assume accented letters would work properly
ἔρως
ἔρως2w ago
they don't, because ç and ã aren't a-zA-Z0-9
Jochem
Jochem2w ago
Huh, til I thought it'd work like sorting
ἔρως
ἔρως2w ago
i was surprised by it too

Did you find this page helpful?