Darkmatter Posts - Answer Overflow

Darkmatter

•Created by Darkmatter on 12/21/2024 in #questions

`tensor.tensor.Tensor`, `max.tensor.Tensor` or `max.driver.Tensor`?

Are we meant to use tensor.tensor.Tensor, max.tensor.Tensor or max.driver.Tensor? The first one might be a re-export of the second, but max.driver.Tensor is the only one which seems to handle multi-device, however most of the max APIs don't seem to work on it despite it seeming to be a better match for ManagedTensorSlice. However, I don't see a way to move data off of the GPU with the max.tensor.Tensor/tensor.tensor.Tensor variants, and I get segfaults in what looks like a memcpy from a GPU pointer to a CPU pointer.

2 replies

MModular

•Created by Darkmatter on 12/18/2024 in #questions

How should I be loading the `get_scalar_from_managed_tensor_slice` kernel?

I am apparently missing it, despite using this as my session load:

var model = session.load(
        graph,
        custom_ops_paths=List(
            Path("kernels.mojopkg"),
            Path(".magic/envs/default/lib/mojo/_mlir.mojopkg"),
            Path(".magic/envs/default/lib/mojo/algorithm.mojopkg"),
            Path(".magic/envs/default/lib/mojo/stdlib.mojopkg"),
            Path(".magic/envs/default/lib/mojo/buffer.mojopkg"),
            Path(".magic/envs/default/lib/mojo/autotune.mojopkg"),
            Path(".magic/envs/default/lib/mojo/benchmark.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compile.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compiler.mojopkg"),
            Path(".magic/envs/default/lib/mojo/complex.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compiler_internal.mojopkg"),
            Path(".magic/envs/default/lib/mojo/kv_cache.mojopkg"),
            Path(".magic/envs/default/lib/mojo/gpu.mojopkg"),
            Path(".magic/envs/default/lib/mojo/layout.mojopkg"),
            Path(".magic/envs/default/lib/mojo/max.mojopkg"),
            Path(".magic/envs/default/lib/mojo/runtime.mojopkg"),
            Path(".magic/envs/default/lib/mojo/subprocess.mojopkg"),                                  
            Path(".magic/envs/default/lib/mojo/tensor_utils_internal.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor_utils.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor_internal.mojopkg"),
        ),
    )

var model = session.load(
        graph,
        custom_ops_paths=List(
            Path("kernels.mojopkg"),
            Path(".magic/envs/default/lib/mojo/_mlir.mojopkg"),
            Path(".magic/envs/default/lib/mojo/algorithm.mojopkg"),
            Path(".magic/envs/default/lib/mojo/stdlib.mojopkg"),
            Path(".magic/envs/default/lib/mojo/buffer.mojopkg"),
            Path(".magic/envs/default/lib/mojo/autotune.mojopkg"),
            Path(".magic/envs/default/lib/mojo/benchmark.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compile.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compiler.mojopkg"),
            Path(".magic/envs/default/lib/mojo/complex.mojopkg"),
            Path(".magic/envs/default/lib/mojo/compiler_internal.mojopkg"),
            Path(".magic/envs/default/lib/mojo/kv_cache.mojopkg"),
            Path(".magic/envs/default/lib/mojo/gpu.mojopkg"),
            Path(".magic/envs/default/lib/mojo/layout.mojopkg"),
            Path(".magic/envs/default/lib/mojo/max.mojopkg"),
            Path(".magic/envs/default/lib/mojo/runtime.mojopkg"),
            Path(".magic/envs/default/lib/mojo/subprocess.mojopkg"),                                  
            Path(".magic/envs/default/lib/mojo/tensor_utils_internal.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor_utils.mojopkg"),
            Path(".magic/envs/default/lib/mojo/tensor_internal.mojopkg"),
        ),
    )

6 replies

MModular

•Created by Darkmatter on 12/7/2024 in #questions

Arena Allocated Coroutines

I was watching the Efficient Coroutine Implementation in MLIR, and it seems like there isn't any room in that design to support arena allocating the frames, nor any place for handling the allocation of a coroutine frame failing. This is somewhat concerning to me because while being able to move to stack allocations is nice, being able to grab a right-sized allocation from an arena allocator is nicer, especially in the context of ensuring you have enough memory for the coroutine. For frequently allocated coroutines (consider the handle_request top-level function of an HTTP server), this means that instead of going through all of the machinery in tcmalloc you may be performing a dequeue operation on a ring buffer of free frames, substantially faster. Would it be possible to have the coroutine take an alloc: Allocator[CoroutineFrameType] = DefaultMojoAllocator parameter in some way or otherwise inject an allocator into the coroutine? I'm still thinking over how I would want custom allocators to behave, but I know that this a feature I and others will want. As for my specialty of databases, not being able to handle allocation failures (because the database is likely the largest memory consumer on any system it is on and typically has a lot of caching, so it can actually do something about allocation failures), means that you can't use the feature in production code because it could lead to unnecessary crashes.

2 replies

MModular

•Created by Darkmatter on 9/17/2024 in #questions

MAX for fixed-function hardware

Does MAX have the ability to represent fixed-function hardware or hardware which has limited programmability? For instance, cryptographic accelerators (Intel QAT, AMD CCP) or NPUs (Intel, AMD, Qualcom, Apple). For "limited programmability", I mean devices like P4 NICs, which are not quite as flexible as FPGAs but are still flexible enough a programming language was required to use them (P4). The language is not turing complete by design to allow for fixed-function ASICs to implement it easily.

5 replies

MModular

•Created by Darkmatter on 9/17/2024 in #questions

Mojo equivalent to Rust's phantom type

Does Mojo have an equivalent to Rust's phantom type? I have a few usecases where I need a raw pointer (for syscalls) to be assigned a lifetime via a wrapper, such as for iovec.

2 replies

MModular

•Created by Darkmatter on 8/14/2024 in #questions

Reliability of Niche Optimizations

Is this behavior that I can rely on? I'm going to write some code that will blow up spectacularly on ARM if this is ever not true.

sizeof[Optional[fn(...) -> ...]]() == sizeof[UnsafePointer[_]]()

sizeof[Optional[fn(...) -> ...]]() == sizeof[UnsafePointer[_]]()

2 replies

MModular

•Created by Darkmatter on 7/20/2024 in #questions

Is there any documentation for Mojo MLIR dialect(s)?

I have embarked on a perhaps foolish quest make calling into C easier by using mlir-translate to import LLVM IR (since clang isn't MLIR ready yet). This does involve a bit of -fdebug-macro as well, and converting those into constants for Mojo. While I can infer some things from error messages, proper documentation would make this much easier. The intention is to make liberal use of inline MLIR for function bodies, and then do my best to translate structs, function signatures, and constants.

3 replies

MModular

•Created by Darkmatter on 7/3/2024 in #questions

Does Mojo have inline assembly?

Does Mojo currently have a blessed way to do inline assembly? LLVM doesn't have intrinsics for ARM MSR reads, and the two best clocks on an ARM system from a microbenchmark perspective are cntvct_el0 and pmccntr_el0, both of which are MSRs. In C I would normally read them as follows:

uint64_t tsc;

__asm__ volatile("isb" ::: "memory");
__asm__ volatile("mrs %0, cntvct_el0" : "=r" (tsc));

uint64_t tsc;

__asm__ volatile("isb" ::: "memory");
__asm__ volatile("mrs %0, cntvct_el0" : "=r" (tsc));

uint64_t tsc;
__asm__ volatile("isb" ::: "memory");
__asm__ volatile("mrs %0, pmccntr_el0" : "=r"(tsc));

uint64_t tsc;
__asm__ volatile("isb" ::: "memory");
__asm__ volatile("mrs %0, pmccntr_el0" : "=r"(tsc));

1 replies

MModular

•Created by Darkmatter on 6/25/2024 in #questions

Any Plans for SPIR-V/OpenCL Support?

Are there any plans for a SPIR-V target for Mojo or an OpenCL backend for Max? It would be useful since it would allow leveraging OpenCL compatible hardware instead of whatever Modular can specifically add support for, in particular FPGAs are very good at inference on small models on price/perf when looking at major cloud providers.

2 replies

MModular

•Created by Darkmatter on 5/19/2024 in #questions

JSON Parser?

Does Mojo have a JSON parser library somewhere in the ecosystem? I'm getting name collisions with C and Perl libraries so I can't find anything. Python interop isn't quite at the level where I can use the Python stdlib one. This is reading some config data, so performance is a not a concern.

2 replies

Gaming

Programming