Extracting Cycles

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

C SIMD Math

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat:

inline fn pdep(input: u64, mask: u64) u64 {
    var ret: u64 = undefined;
    asm volatile ("pdepq %[mask], %[src], %[dst]"
        : [dst] "=r" (ret),
        : [src] "r" (input),
          [mask] "r" (mask),
    );
    return ret;
}

inline fn parseTwoTripletsLow(chunk: u64, active_lanes: u64) !u32 {
    const d = chunk -% (0x2E3030302E303030 & active_lanes);

    const mask: u64 = 0x000000FF000000FF;
    const o = (d >> 16) & mask;
    const t = (d >> 8) & mask;
    const h = d & mask;
    const compact = h * 100 + t * 10 + o;

    if (compact & ~mask != 0 or
        (d | (d +% 0x7F76767D7F76767D)) & 0x8080808080808080 != 0)
        return error.Invalid;

    return @intCast(pext(compact, mask));
}

The expected speedup of a parallel section is bounded by Amdahl’s law:

$S(s, p) = \frac{1}{(1 - p) + \frac{p}{s}}$

Duis aute irure dolor in reprehenderit, where $s$ is the speedup of the improved portion and $p$ is the fraction of the runtime it affects. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Test