ModularM
Modularโ€ข2y agoโ€ข
8 replies
toasty

Iterating over Strings containing Unicode characters?

Has anyone figured out how to do this? Maxim's code works perfectly for getting the count of printable characters in a String.

alias simd_width_u8 = simdwidthof[DType.uint8]()

fn rune_count_in_string(s: String) -> Int:
    var p = s._as_ptr().bitcast[DType.uint8]()
    var string_byte_length = len(s)
    var result = 0

    @parameter
    fn count[simd_width: Int](offset: Int):
        result += (
            ((p.load[width=simd_width](offset) >> 6) != 0b10)
            .cast[DType.uint8]()
            .reduce_add()
            .to_int()
        )

    vectorize[count, simd_width_u8](string_byte_length)
    return result


And the ord and chr changes from the nightly branch help with handling conversion of a single character. But I've been unable to iterate through the codepoints of a string, as iterating through the bytes doesn't work for this use case.

It's all fairly new to me, so I'm curious if anyone has already solved this problem ๐Ÿ™‚
Was this page helpful?