Notes from the 7/28/2022 meeting of TFF collaborators

  • New people
  • Let’s all be on the Discord server to facilitate conversations interactively
    • Ping Krzys to become a Contributor to be able to post
  • SIG Federated
  • Discussion of free-riding and data poisoning in x-silo, discussion led by LinkedIn (context from use cases identified by LinkedIn unless specified otherwise):
    • Free riding - certain tenants not contributing to the group, so diluting benefit
      • Could be intentional or unintentional
      • Focus on the unintentional at this point - this is the case we’re interested in at LinkedIn primarily
      • Could be a simple as a participant not having enough data, or data that is not useful in training
        • Currently thinking of modeling this as an anomaly detection problem
        • Comparing against majority contrbiution works if it’s the ase for minority of the data
        • Another approach: multiple federated models, built with or without contributions from a given participant; observe which ones make progress, and exclude participants based on that
      • Some freeriders could be contributing garbage data
        • Harder to model as anomaly detection
        • Same approach as above
    • Poisoning
      • Likewise, could be intentional or not
      • Focus on the unintentional - larger tenants can overwhelm the group and bias the model towards their contributions
      • For scenarios of interest, this bears similarities to the freerider problem
      • Relevant techniques in distributed byzantine training
        • E.g., instead of average, could adopt a median to add some robustness against poisoning
    • Do we see these problems occuring elsewhere, is it worth contributing such logic to the ecosystem?
      • Yes! Common problems to see in adversarial settings, where silos interests may not be aligned (contributions incur computation cost and require resources)
    • How can we measure the impact of freeloading or poisoning?
      • Per contribution vs. in aggregate - ideas above point to the latter
    • Observation: one of the features of TFF is parameterizable and stateful aggregations that can maintain their own internal state and update that state as they aggregate.
    • Thoguhts on the tradeoffs and synergies with other goals (e.g., DP)
      • DP can definitely help with poisoning
      • Question about DP in the contetx of freloading - still an open question
    • We found data poisoning attacks could have negligible impact
  • Write up with ideas with more details on the above and proposals for components to add to the TFF ecosystem from LinkedIn upcoming
  • See more discussion on Discord
  • Next meeting in 2 weeks