Create PIDGENX.md

2023-09-06 21:41:46 -04:00 · 2023-09-06 21:41:46 -04:00 · ab30a46870
parent 48507bdbdb
commit ab30a46870
1 changed files with 237 additions and 0 deletions
--- a/PIDGENX.md
+++ b/PIDGENX.md
@ -0,0 +1,237 @@
 # PIDGENX Validation
 By WitherOrNot
 ## Background
 This writeup assumes that you have already read the [patent for this signature scheme](https://patentimages.storage.googleapis.com/a3/27/c1/3c0948a078cb28/US7587605.pdf).
 The patent itself assumes decent knowledge of elliptic curve cryptography, group theory, and field theory. 
 If you are looking to implement this system yourself and are unfamiliar with the math, please [contact the UMSKT Team](https://umskt.zulipchat.com), 
 as there will not be sufficient room in this writeup to explain all of the math involved.
 ## Patent Summary
 The following vocabulary is borrowed from the patent.
 - $M$, the message, or data, to be validated
 - $K$, a finite field of prime order
 - $p$, the order of $K$
 - $K_3$, an extension field of $K$ of degree 3
 - $K_6$, an extension field of $K_3$ of degree 2
 - $E$, an elliptic curve of the form $y^2 = x^3 + ax + b$
 - $m$, the order of $E/K$
 - $\vec{\alpha}$, the private key vector
 - $H_1(M)$, a hash function which outputs a vector of dimension $n$
 - $H_2(M)$, a hash function which outputs a point in $E/K$
 - $P$, a generator of the order $m$ torsion subgroup of $E/K_6$
 - $S$, the signature point in $E/K$
 - $\vec{Q}$, a collection of $n$ points generated by $P$
 - $e_m(P, Q)$, a Tate pairing of order $m$ between points $P$ and $Q$
 In short, ignoring all of the server-side authentication mechanisms and parameter generation procedures, message validation boils down to the following as per the patent:
 1. Given a message $M$, generate the hash vector $\vec{v} = H_1(M)$
 2. Generate the hash point $T = H_2(M)$
 3. Calculate the point $Q = \sum v_iQ_i$
 4. Calculate $c_1=e_m(Q, T)$
 5. Compare this value to $c=e_m(P, S)$
 6. If $c = c_1$, the message is valid, otherwise the message is invalid
 Generation of a signature is the reverse of this process:
 1. Given a message $M$, generate the hash vector $\vec{v} = H_1(M)$
 2. Generate the hash point $T = H_2(M)$
 3. Generate $\vec{Q}$ with $Q_i = \alpha_i P$
 4. Calculate $\alpha = \sum v_i \alpha_i \pmod{m}$
 5. Calculate $S = \alpha T$
 6. Return the tuple $(M, S)$, and let $\vec{Q}$ be a public parameter
 Note that this signature scheme works due to the bilinearity of the pairing function, as
 $$ Q = \sum v_iQ_i = \sum v_i \alpha_i P = \left(\sum v_i \alpha_i\right) P = \alpha P $$
 $$ e_m(P, S) = e_m(P, \alpha T) = e_m(\alpha P, T) = e_m(Q, T) $$
 ## Practical Validation
 While the patent provides a very comprehensive description of the signature implementation, it fails to completely describe the mechanism used in PIDGENX specifically. 
 This section is devoted to describing the specifics of PIDGENX's validation mechanism.
 ### Public Key Binary Format
 The binary serialzed form of a PIDGENX public key is as follows, given in [ImHex Pattern Language](https://docs.werwolv.net/pattern-language/).
 ```
 u8 size_bignum @ 0x15;
 // Bignums are stored in little endian order
 struct bignum {
    u8 data[size_bignum];
 };
 struct field_data {
    u32 magic; // 0x00112233
    u16 eq256; // must be 0x0100
    padding[2];
    u8 must_be_0; // Enables use of unused values, but parser will error if nonzero
    u8 size_modulus; // Same as size_bignum
    u8 size_order;
    u32 ext_deg1; // Degree of first field extension
    u32 ext_deg2; // Degree of second field extension
    u32 at_0x1f; // unused
    u32 at_0x23; // unused
    u32 at_0x27; // unused
    u32 offset_modulus;
    u32 at_0x2f; // unused
    u32 at_0x33; // unused
    u32 at_0x37; // unused
    u8 max_quotient; // Equals floor((24 ^ 25) / p) - 1, unused
    u8 h1_coeffs[size_modulus]; // H1 radices
    bignum modulus; // Prime modulus of base field
    u8 order[size_order]; // Order of elliptic curve over base field (also a bignum)
    // Polynomials are stored as bytes in order of lowest to highest degree coefficients
    u8 ext_split_poly1[ext_deg1 + 1]; // Minimal polynomial of first field extension
    u8 ext_split_poly2[ext_deg2 + 1]; // Minimal polynomial of second field extension
    // Unused capability: y^2 = x^3 + ec_a_base * x + ec_b_base
    // This curve would be over the base field
    bignum ec_a_base;
    bignum ec_b_base;
    // Elliptic curve: y^2 = x^3 + ax + b
    bignum ec_a; // a coefficient
    bignum ec_b; // b coefficient
 };
 struct ecpoint_k3 {
    bignum x[3];
    bignum y[3];
 };
 struct Pubkey {
    u32 magic; // 0x44556677
    u16 eq256; // must be 0x0100
    padding[2];
    u32 field_data_size;
    field_data field;
    padding[field_data_size + 12 - $];
    ecpoint_k3 points[field.size_modulus]; // Points Qi over twisted curve
    bignum pairing_val[field.ext_deg1 * field.ext_deg2]; // value of Tate pairing between generator P and signature base point S
 };
 Pubkey pubkey @ 0;
 ```
 In total, the public key describes the following:
 - The parameters of $E$: $a$, $b$, and $m$
 - The modulus of $K$: $p$
 - $F_1$, the splitting polynomial of $K_3$ over $K$
 - $F_2$, the splitting polynomial of $K_6$ over $K_3$
 - $\vec{r}$, a collection of radices used in the implementation of $H_1$
 - $\vec{Q}'$, a collection of points over $E/K_3$
 - $c$, the value of $e_m(P, S)$
 From this, it can already be seen that the implementation of PIDGENX's signature scheme is different from the patent.
 ### PIDGENX Verification
 The first major departure from the patent is that $M$, the message, is never actually directly provided in practice. 
 $M$ normally contains some digits of the product ID associated with a product key, as well as presumed authentication and upgrade bits, but the value of $M$ is never exposed to the user.
 This is because **all product keys are hashes of $M$**. 
 The hash function used to derive product keys is unknown as of yet, and will be referred to as $HASH(M)$. 
 $HASH(M)$ is decoded from the product key through conversion from base-24, with the base alphabet being `BCDFGHJKMPQRTVWXY2346789`. 
 The value of $HASH(M)$ is given a basic check against $\vec{r}$ by checking that $\left \lfloor {\frac{HASH(M)}{p}} \right \rfloor \leq (r_2 + 1)(r_3 + 1)$.
 Another important note is the binary serialization of elements of extension fields. 
 Given an extension field $K \supseteq L$, with $u$ being the primitive element of the extension, all elements $w \in K$ can be expressed as
 $$ w = z_0 + z_1 u + z_2 u^2 + \ldots + z_{n-1} u^{n-1} $$
 where $n$ is the degree of the extension and $z_i \in L$.
 This encoding method is used to represent elements of an extension field as arrays of elements in the base field, and is used to encode the coordinates of elliptic curve points as well.
 The patent also describes that points in $E/K_6$ can be represented as points in $E/K_3$ to reduce storage space.
 This compression method is used on the points in $\vec{Q}'$, and must be undone before pairing computation to retrieve $\vec{Q}$.
 The method to undo this compression is implemented as follows:
 1. Let $t$ be the primitive element of the extension from $K_3$ to $K_6$.
 2. Let $d = t^2$
 3. Define $E' = x^3 + d^2 ax + d^3 b$
 4. Define the untwist mapping $\tau : E' \rightarrow E = (x,y) \rightarrow (t^{-2}x, t^{-3}y)$
 5. Given a point $R' \in E/K_3$, recover $R = \tau(R')$
 Since the quadratic twist on $E$ is preserved through point addition, the untwist mapping can alternatively be done on the final point $Q$ to save time.
 Next, the value of $H_1(M)$ is found through a meet-in-the-middle tree search. This is necessary since the value of $M$ is unavailable. 
 Some of the elements of $H_1(M)$ are known in advance, as for $\vec{v} = H_1(M)$ and $k = \left \lfloor {\frac{HASH(M)}{p}} \right \rfloor$, $v_1 = 1$, $v_2 = k \bmod (r_2 + 1)$, and $v_3 = \left \lfloor {\frac{k}{r_2 + 1}} \right \rfloor$. 
 The point $T = H_2(M)$ is found from $h = HASH(M)$ as $H_2(M) = lift_x\left(h \bmod p\right)$, where the function $lift_x(x)$ finds a point on $E/K$ with the x-coordinate $x$. 
 Note that two points meet this definition, differing by the sign of their y-coordinate, the correct point is chosen by an unknown hash algorithm.
 Afterwards, the value of M is recovered through the following algorithm, with $\vec{v} = H_1(M)$ and $\vec{r}$ from the public key:
 ```py
 M = 0
 # Indexing backwards from n ... 1, ignoring the first element
 for i in range(len(r) - 1, 0, -1):
    M *= r[i]
    M += v[i]
 ```
 The final value of M can then be interpreted with the following bitfields:
 ```c
 struct DECODED_PKEY {
    uint auth : 10; // presumed to be authentication bits
    uint pid : 30; // middle 9 digits of Product ID
    bool upgrade : 1; // presumed to indicate upgrade keys
 }
 ```
 Finally, the reference pairing value of $c = e_m(P, S)$ is stored in the public key rather than the points $P$ and $S$, most likely to reduce storage space and computation costs.
 ## Practical Generation
 Although much of the information involved in generating a signature is not provided in public keys, 
 it turns out that all of the information necessary to generate signatures is theoretically recoverable.
 The main property of PIDGENX public keys that makes key generation theoretically possible is that the pairing value $c$ is constant.
 Since $P$ is a constant, and the value of $c$ depends on both $P$ and $S$, $S$ must also be effectively constant.
 This allows generation of product keys without needing to know the implementation of $H_2$, assuming that $\vec{\alpha}$ is known and one valid, decoded product key is available per public key.
 Letting $M$ be the message of the known product key and $h = HASH(M)$, $S$ can be recovered like so:
 $$ T = lift_x(h \bmod p) $$
 $$ \vec{v} = H_1(M) $$
 $$ \alpha = \sum v_i \alpha_i \pmod{m} $$
 $$ S = \alpha T $$
 And now with the value of $S$ known, the points $T$ can be generated without using $H_2$.
 This allows key generation via the following algorithm, for a given message $M$:
 $$ \vec{v} = H_1(M) $$
 $$ \alpha = \sum v_i \alpha_i \pmod{m} $$
 $$ \beta = \alpha^{-1} \pmod{m} $$
 $$ T = \beta S $$
 And now, the value of $HASH(M)$ can be recovered from $\vec{v}$ and the x-coordinate of $T$, then encoded into a product key.
 ## Current Plans
 As shown above, the only thing hindering product-key generation is finding the private key vector $\vec{\alpha}$, which requires the use of an ECDLP solver. 
 We are currently in the process of creating such a solver, since readily available solvers do not support the finite field arithmetic required for our case.
 The group sizes at play are 112-bit, which are [within the realm of possibility](https://www.joppebos.com/presentations/112bitECDLP.pdf), especially given the [capabilities of modern GPUs](https://en.wikipedia.org/wiki/Discrete_logarithm_records#Elliptic_curves). We plan to recruit volunteers for this purpose and will
 be releasing a distributed solver client to let anyone participate, so stay tuned!
 ## Credits/Thanks
 - diamondggg, for providing the steps of the algorithm we hadn't reverse-engineered
 - CONIGUERO, Neo-Desktop, Endermanch, and TheTank20, for assisting in the reverse-engineering process
 - Microsoft, for making cool software worth researching :heart: