
"AI developers have long faced a dilemma: The more training data you feed a large language model (LLM), the more fluent and human-like its output will be. However, at the same time, you run the risk of including sensitive personal information in that dataset, which the model could then republish verbatim, leading to major security compromises for the individuals affected and damaging PR scandals for the developers."
"The key ingredient behind VaultGemma is a mathematical framework known as differential privacy (DP), which is essentially digital noise that scrambles the model's ability to perfectly memorize information found in its training data. Crucially, the researchers embedded DP at the level of sequences of tokens. This means that at the most fundamental level, VaultGemma will not be able to perfectly memorize or reproduce the details on which it's been trained."
VaultGemma is a framework that embeds differential privacy (DP) at the level of token sequences to reduce verbatim memorization by large language models. The DP mechanism adds digital noise to scramble the model's ability to reproduce exact sequences containing potentially sensitive personal information. The sequence-level protection prevents perfect memorization or reproduction of training details while aiming to retain fluent, human-like outputs. The approach seeks to balance model utility and user privacy and shows promising results in reducing risks of security compromises and reputational damage. Further research and engineering are required to refine and scale the method.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]