Ich mag Pfosten.

I like posts.

  • 0 Posts
  • 7 Comments
Joined 1 year ago
cake
Cake day: July 8th, 2023

help-circle
  • The text does technically give the reason on the first page:

    It is not a regular language and hence cannot be parsed by regular expressions.

    Here, “regular language” is a technical term, and the statement is correct.

    The text goes on to discuss Perl regexes, which I think are able to parse at least all languages in LL(*). I’m fairly sure that is sufficient to recognize XML, but am not quite certain about HTML5. The WHATWG standard doesn’t define HTML5 syntax with a grammar, but with a stateful parsing procedure which defies normal placement in the Chomsky hierarchy.

    This, of course, is the real reason: even if such a regex is technically possible with some regex engines, creating it is extremely exhausting and each time you look into the spec to understand an edge case you suffer 1D6 SAN damage.


  • For a project like Signal, there are competing aspects of security:

    • privacy and anonymity: keep as little identifiable information around as possible. This can be a life or death thing under repressive governments.

    • safety and anti-abuse: reliably block bad actors such as spammers, and make it possible for users to reliably block specific people (e.g. a creepy stalker). This is really important for Signal to have a chance at mass appeal (which in turn makes it less suspicious to have Signal installed).

    Phone number verification is the state of the art approach to make it more expensive for bad actors to create thousands of burner accounts, at the cost of preventing fully anonymous participation (depending on the difficulty of getting a prepaid SIM in your country).

    Signal points out that sending verification SMS is actually one of its largest cost centers, currently accounting for 6M USD out of their 14M USD infrastructure budget: https://signal.org/blog/signal-is-expensive/

    I’m sure they would be thrilled if there were cheaper anti-abuse measures.


  • This article is ahistoric and unnecessarily conspirational.

    Signal and its predecessors like TextSecure have been run by different companies/organizations:

    • Whisper Systems
    • Open Whisper Systems
    • Signal Technology Foundation (and its subsidiary Signal Messenger LLC)

    Open Whisper Systems received about 3M USD total from the US government via the Open Technology Fund for the purpose of technology development … during 2013 to 2016. Source: archive of the OTF website: https://web.archive.org/web/20221015073552/https://www.opentech.fund/results/supported-projects/open-whisper-systems/

    The Signal Foundation (founded 2018) was started by an 105M USD interest free loan from Brian Acton, known for co-founding WhatsApp and selling it to Facebook (now Meta).

    So important key insights:

    • It doesn’t seem like the Signal Foundation received US government funding. (Though I haven’t checked financial statements.)
    • The US government funding seems to be a thing of the fairly distant past (2016). The article makes it sound like the funding was just pulled this year.
    • The US government funding was small compared to Signal’s current annual budget. It was not small at the time, but now Signal regularly makes more from licensing its technology than it regularly received from the US government. According to ProPublica, Signals financial statements for 2022 indicate revenue of about 26M USD

  • Cryptography works. At least until sufficiently powerful quantum computers arrive, TLS reliably ensures confidentiality between your browser and the server. No one else can snoop on the data transmitted via that connection.

    But are you connected to the right server? Without some kind of authentication, any adversary in the middle (such as your ISP) could impersonate the real server.

    That is where certificates come in. They are issued by neutral certificate authorities (CAs) that check the identity. It works something like this:

    • I, the server operator, create a private key on that server. I use that key to create a certificate request which asks the CA to give me a certificate. This request also contains the domain names for which the key shall be used.
    • The CA performs identity checks.
    • The CA issues me the certificate. I install it on my server. Now, when browsers create a TLS connection I can tell them: here’s my public key you can use to check my identity, and here’s a certificate that shows that this is a valid key for this domain name!
    • The browser will validate the certificate and see if the domain name matches one of the names in the certificate.

    What kind of checks are done depends on the CA. I’ve obtained certificates by appearing in person at a counter, showing my government ID, and filling out a form. Nowadays more common is the ACME protocol which enables automated certificate issuance. With ACME, the CA connects to the server from multiple network locations (making interception unlikely) and checks if the server provides a certain authentication token.

    To know which certificates are valid, browsers must know which CAs are trusted. Browser makers and CAs have come together to create an evolving standard of minimum requirements that CAs must fulfill to be eligible for inclusion in the browser’s default trust store. If a CA violates this (for example by creating certificates that can be used for government traffic interception, or by creating a certificate without announcing it in a public transparency list), then future browser versions will remove them, making all their certificates worthless.

    eIDAS 2 has the effect of circumventing all of this. There is to be a government-controlled CA (already high-risk) that has its own verification rules set by legislation (does not meet industry standard rules). And browsers would be legally forced to include the eIDAS CAs as “trusted”.

    This puts browsers in a tough spot because they’ve resisted these kinds of requests from authoritarian regimes in the past. But now the world’s largest trade bloc is asking. Browsers can comply or leave the EU market, or maybe provide a less secure EU edition? Awakens uncomfortable memories around the failed US attempts at cryptography export control (cryptography is considered a munition, like hand grenades or ballistic missiles).

    It is plausible that the EU is doing this with good intentions: having a digital identity scheme is useful, it makes sense for identity to be government-controlled (like passports), and such a scheme will only see success if it sees industry adoption. The EU has also seen that hoping for voluntary industry adoption doesn’t generally work, e.g. see the USB-C mandate.



  • On the other hand, the GDPR’s concept of “personal data” is extremely broad, much more so than the US concept of PII. Personal data is any information relating to an identifiable person. Pseudonymous info is still personal under this definition. Online usernames or social media handles are identifiers, and any linked info (e.g. posts, comments, likes) is personal data as well.

    So Lemmy and other Fediverse stuff is well within the GDPR’s material scope.

    However, the GDPR’s “right to erasure /to be forgotten” is more nuanced. It doesn’t quite always apply (though usually does). OP very likely has the right to request deletion from individual instances.

    Posts have been published through federation. The GDPR anticipates this (I think in Art 17(2)): if personal data has been made public by the data controller, and erasure is requested, then the data controller is obliged to take reasonable steps to notify other controllers of this.

    The ActivityPub protocol has built-in support for sending out such deletion notifications, and last time I checked Lemmy implements this. Of course the receiving instance might not honor this, but that’s outside of the responsibility of the initial data controller.

    While I’m not entirely convinced that everything here is 100% compliant, federation is less of a compliance issue than it might seem.


  • C++ does have the problem that references are not objects, which introduces many subtle issues. For example, you cannot use a type like std::vector<int&>, so that templated code will often have to invoke std::remove_reference<T> and so on. Rust opts for a more consistent data model, but then introduces auto-deref (and the Deref trait) to get about the same usability C++ has with references and operator->. Note that C++ will implicitly chain operator-> calls until a plain pointer is reached, whereas Rust will stop dereferencing once a type with a matching method/field is found. Having deep knowledge of both languages, I’m not convinced that C++ features “straightforward consistency” here…