Notes from the 7/28/2022 meeting of TFF collaborators

New people
Let’s all be on the Discord server to facilitate conversations interactively
- Ping Krzys to become a Contributor to be able to post
SIG Federated
Discussion of free-riding and data poisoning in x-silo, discussion led by LinkedIn (context from use cases identified by LinkedIn unless specified otherwise):
- Free riding - certain tenants not contributing to the group, so diluting benefit
  - Could be intentional or unintentional
  - Focus on the unintentional at this point - this is the case we’re interested in at LinkedIn primarily
  - Could be a simple as a participant not having enough data, or data that is not useful in training
    - Currently thinking of modeling this as an anomaly detection problem
    - Comparing against majority contrbiution works if it’s the ase for minority of the data
    - Another approach: multiple federated models, built with or without contributions from a given participant; observe which ones make progress, and exclude participants based on that
  - Some freeriders could be contributing garbage data
    - Harder to model as anomaly detection
    - Same approach as above
- Poisoning
  - Likewise, could be intentional or not
  - Focus on the unintentional - larger tenants can overwhelm the group and bias the model towards their contributions
  - For scenarios of interest, this bears similarities to the freerider problem
  - Relevant techniques in distributed byzantine training
    - E.g., instead of average, could adopt a median to add some robustness against poisoning
- Do we see these problems occuring elsewhere, is it worth contributing such logic to the ecosystem?
  - Yes! Common problems to see in adversarial settings, where silos interests may not be aligned (contributions incur computation cost and require resources)
- How can we measure the impact of freeloading or poisoning?
  - Per contribution vs. in aggregate - ideas above point to the latter
- Observation: one of the features of TFF is parameterizable and stateful aggregations that can maintain their own internal state and update that state as they aggregate.
  - E.g. federated_aggregate
- Thoguhts on the tradeoffs and synergies with other goals (e.g., DP)
  - DP can definitely help with poisoning
  - Question about DP in the contetx of freloading - still an open question
- We found data poisoning attacks could have negligible impact
  - E.g., see https://arxiv.org/pdf/2108.10241.pdf
  - Important to provide such a feature as a part of a cros-silo FL platform regardless of magnitude of impact
Write up with ideas with more details on the above and proposals for components to add to the TFF ecosystem from LinkedIn upcoming
See more discussion on Discord
Next meeting in 2 weeks