Extracting Cycles_

Cache-friendly reads, user-hostile writes.

Test

2026-01-01

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

C SIMD Math

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat:

inline fn pdep(input: u64, mask: u64) u64 {
    var ret: u64 = undefined;
    asm volatile ("pdepq %[mask], %[src], %[dst]"
        : [dst] "=r" (ret),
        : [src] "r" (input),
          [mask] "r" (mask),
    );
    return ret;
}

inline fn parseTwoTripletsLow(chunk: u64, active_lanes: u64) !u32 {
    const d = chunk -% (0x2E3030302E303030 & active_lanes);

    const mask: u64 = 0x000000FF000000FF;
    const o = (d >> 16) & mask;
    const t = (d >> 8) & mask;
    const h = d & mask;
    const compact = h * 100 + t * 10 + o;

    if (compact & ~mask != 0 or
        (d | (d +% 0x7F76767D7F76767D)) & 0x8080808080808080 != 0)
        return error.Invalid;

    return @intCast(pext(compact, mask));
}

The expected speedup of a parallel section is bounded by Amdahl’s law:

Duis aute irure dolor in reprehenderit, where is the speedup of the improved portion and is the fraction of the runtime it affects. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.