Updated referencing for figures in privacy_security.qmd

2026-05-07 10:08:50 -05:00 · 2023-11-30 11:26:21 -05:00
parent fec4d8ea7c
commit 26357a4771
1 changed files with 37 additions and 37 deletions
--- a/privacy_security.qmd
+++ b/privacy_security.qmd
@@ -210,9 +210,9 @@ As the number of poisoned images on the internet increases, the performance of t

 On the flip side, this tool can be used maliciously and can affect legitimate applications of the generative models. This goes to show the very challenging and novel nature of machine learning attacks.

-Figure 1 demonstrates an example of images from a poisoned model at different levels of poisoning:
+@fig-poisoning demonstrates an example of images from a poisoned model at different levels of poisoning:

-![Figure 1: The effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. The progression of producing cat images from a dog prompt, cow images from a car prompt, and other poisoned results is demonstrated above.](images/security_privacy/image14.png)
+![The effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. The progression of producing cat images from a dog prompt, cow images from a car prompt, and other poisoned results is demonstrated above.](images/security_privacy/image14.png){#fig-poisoning}

 ### Adversarial Attacks

@@ -304,9 +304,9 @@ Various physical tampering techniques can be used for fault injection. Low volta

 For ML systems, consequences include impaired model accuracy, denial of service, extraction of private training data or model parameters, and reverse engineering of model architectures. Attackers could use fault injection to force misclassifications, disrupt autonomous systems, or steal intellectual property.

-For example, in [@breier2018deeplaser], the authors were able to successfully inject a fault attack into a deep neural network deployed on a microcontroller. They used a laser to heat up specific transistors, forcing them to switch states. In one instance, they used this method to attack a ReLU activation function resulting in the function to always outputing a value of 0, regardless of the input. In the assembly code in Figure 2, the attack caused the executing program to always skip the `jmp` end instruction on line 6. This means that `HiddenLayerOutput[i]` is always set to 0, overwriting any values written to it on lines 4 and 5. As a result, the targeted neurons are rendered inactive, resulting in misclassifications.
+For example, in [@breier2018deeplaser], the authors were able to successfully inject a fault attack into a deep neural network deployed on a microcontroller. They used a laser to heat up specific transistors, forcing them to switch states. In one instance, they used this method to attack a ReLU activation function resulting in the function to always outputing a value of 0, regardless of the input. In the assembly code in @fig-injection, the attack caused the executing program to always skip the `jmp` end instruction on line 6. This means that `HiddenLayerOutput[i]` is always set to 0, overwriting any values written to it on lines 4 and 5. As a result, the targeted neurons are rendered inactive, resulting in misclassifications.

-![Figure 2: Assembly code demonstrating the impact of the fault-injection attack on a deep neural network deployed on a microcontroller. The fault-injection attack caused the rogram to always skip the jmp end instruction on line 6, causing `HiddenLayerOutput[i]` to be reset to zero. This overwrites any values written to it on lines 4 and 5, rendering certain neurons always inactive and resulting in misclassifications.](images/security_privacy/image3.png)
+![Assembly code demonstrating the impact of the fault-injection attack on a deep neural network deployed on a microcontroller. The fault-injection attack caused the rogram to always skip the jmp end instruction on line 6, causing `HiddenLayerOutput[i]` to be reset to zero. This overwrites any values written to it on lines 4 and 5, rendering certain neurons always inactive and resulting in misclassifications.](images/security_privacy/image3.png){#fig-injection}

 The strategy for an attacker could be to infer information about the activation functions using side-channel attacks (discussed next). Then the attacker could attempt to target multiple activation function computations by randomly injecting faults into the layers that are as close to the output layer as possible. This increases the likelihood and impact of the attack.

@@ -332,17 +332,17 @@ Once the attacker has collected a sufficient number of traces, they would then u

 Below is a simplified visualization of how analyzing the power consumption patterns of the encryption device can help us extract information about algorithm's operations and, in turn, about the secret data. Say we have a device that takes a 5-byte password as input. We are going to analyze and compare the different voltage patterns that are measured while the encryption device is performing operations on the input to authenticate the password.

-First, consider the power analysis of the device's operations after entering a correct password in the first picture in Figure 3. The dense blue graph is the output of the encryption device's voltage measurement. What matters here is the comparison between the different analysis charts rather than the specific details of what is going on in each scenario.
+First, consider the power analysis of the device's operations after entering a correct password in the first picture in @fig-encryption. The dense blue graph is the output of the encryption device's voltage measurement. What matters here is the comparison between the different analysis charts rather than the specific details of what is going on in each scenario.

-![Figure 3: The power analysis chart of the encryption device's operations after the correct password is entered. The dense blue graph represents the output of the encryption device’s voltage measurement.](images/security_privacy/image5.png)
+![The power analysis chart of the encryption device's operations after the correct password is entered. The dense blue graph represents the output of the encryption device’s voltage measurement.](images/security_privacy/image5.png){#fig-encryption}

-Now, let's look at the power analysis chart when we enter an incorrect password in Figure 4. The first three bytes of the password are correct. As a result, we can see that the voltage patterns are very similar or identical between the two charts, up to and including the fourth byte. After the device processes the fourth byte, it determines that there is a mismatch between the secret key and the attempted input. We notice a change in the pattern at the transition point between the fourth and fifth bytes: the voltage has gone up (the current has gone down) because the device has stopped processing the rest of the input.
+Now, let's look at the power analysis chart when we enter an incorrect password in @fig-encryption2. The first three bytes of the password are correct. As a result, we can see that the voltage patterns are very similar or identical between the two charts, up to and including the fourth byte. After the device processes the fourth byte, it determines that there is a mismatch between the secret key and the attempted input. We notice a change in the pattern at the transition point between the fourth and fifth bytes: the voltage has gone up (the current has gone down) because the device has stopped processing the rest of the input.

-![Figure 4: The power analysis chart of the encryption device's operations after an incorrect password is entered. The first three bytes are correct, but the fourth byte is incorrect, causing the device to spot a mismatch between the secret key and the attempted input. As a result, the device stops processing the rest of the input, and the voltage, as respresented by the blue graph, increases.](images/security_privacy/image16.png)
+![The power analysis chart of the encryption device's operations after an incorrect password is entered. The first three bytes are correct, but the fourth byte is incorrect, causing the device to spot a mismatch between the secret key and the attempted input. As a result, the device stops processing the rest of the input, and the voltage, as respresented by the blue graph, increases.](images/security_privacy/image16.png){#fig-encryption2}

-Figure 5 describes another chart of a completely wrong password. After the device finishes processing the first byte, it determines that it is incorrect and stops further processing - the voltage goes up and the current down.
+@fig-encryption3 describes another chart of a completely wrong password. After the device finishes processing the first byte, it determines that it is incorrect and stops further processing - the voltage goes up and the current down.

-![Figure 5: Another example power analysis chart of the encryption device's operations after an incorrect password is entered. In this case, the first byte is incorrect, so the device spots this mismatch and does not process the rest of the inputs.](images/security_privacy/image15.png)
+![Another example power analysis chart of the encryption device's operations after an incorrect password is entered. In this case, the first byte is incorrect, so the device spots this mismatch and does not process the rest of the inputs.](images/security_privacy/image15.png){#fig-encryption3}

 The example above shows how we can infer information about the encryption process and the secret key itself through analyzing different inputs and try to 'eavesdrop' on the operations that the device is performing on each byte of the input.

@@ -454,7 +454,7 @@ The importance of TEEs in ML hardware security stems from their ability to prote

 #### Mechanics

-The fundamentals of TEEs are as follows. They contain four main parts:
+The fundamentals of TEEs (see @fig-enclave) contain four main parts:

 - **Isolated Execution:** Code within a TEE runs in a separate environment from the device's main operating system. This isolation protects the code from unauthorized access by other applications.

@@ -464,7 +464,7 @@ The fundamentals of TEEs are as follows. They contain four main parts:

 - **Data Encryption:** Data handled within a TEE can be encrypted,making it unreadable to entities without the proper keys, which are also managed within the TEE.

-Here are some examples of trusted execution environments (TEEs) that provide hardware-based security for sensitive applications:
+Here are some examples of TEEs that provide hardware-based security for sensitive applications:

 - **[ARMTrustZone](https://www.arm.com/technologies/trustzone-for-cortex-m):**Creates secure and normal world execution environments isolated using hardware controls. Implemented in many mobile chipsets.

@@ -474,11 +474,11 @@ Here are some examples of trusted execution environments (TEEs) that provide har

 - **[Apple SecureEnclave](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web):**TEE for biometric data and key management on iPhones and iPads.Facilitates mobile payments.

-![Figure 6: System on chip showing secure enclave isolated from the main processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. The secure enclave has a mechanism to store inromation securely on attached storage seperate from the NAND flash storage used by the application processor and operating system. This design keeps sensitive user data secure even when the Application Processor kernel becomes compromised. Credit: [Apple](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.](images/security_privacy/image1.png)
+![System on chip showing secure enclave isolated from the main processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. The secure enclave has a mechanism to store inromation securely on attached storage seperate from the NAND flash storage used by the application processor and operating system. This design keeps sensitive user data secure even when the Application Processor kernel becomes compromised. Credit: [Apple](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.](images/security_privacy/image1.png){#fig-enclave}

 #### Trade-Offs

-If TEEs are so good, why don't all systems have TEE enabled by default? The decision to implement a Trusted Execution Environment (TEE) is not taken lightly. There are several reasons why a TEE might not be present in all systems by default. Here are some trade-offs and challenges associated with TEEs:
+If TEEs are so good, why don't all systems have TEE enabled by default? The decision to implement a TEE is not taken lightly. There are several reasons why a TEE might not be present in all systems by default. Here are some trade-offs and challenges associated with TEEs:

 **Cost:** Implementing TEEs involves additional costs. There are direct costs for the hardware and indirect costs associated with developing and maintaining secure software for TEEs. These costs may not be justifiable for all devices, especially low-margin products.

@@ -518,9 +518,9 @@ Secure Boot helps protect embedded ML hardware in several ways:

 #### Mechanics

-A Trusted Execution Environment (TEE) benefits from Secure Boot in multiple ways. For instance, during initial validation, Secure Boot ensures that the code running inside the TEE is the correct and untampered version approved by the device manufacturer. It can ensure resilience against tampering by verifying the digital signatures of the firmware and other critical components, Secure Boot prevents unauthorized modifications that could undermine the TEE's security properties. Secure Boot establishes a foundation of trust upon which the TEE can securely operate, enabling secure operations such as cryptographic key management, secure processing, and sensitive data handling.
+TEEs benefit from Secure Boot (see @fig-secure-boot) in multiple ways. For instance, during initial validation, Secure Boot ensures that the code running inside the TEE is the correct and untampered version approved by the device manufacturer. It can ensure resilience against tampering by verifying the digital signatures of the firmware and other critical components, Secure Boot prevents unauthorized modifications that could undermine the TEE's security properties. Secure Boot establishes a foundation of trust upon which the TEE can securely operate, enabling secure operations such as cryptographic key management, secure processing, and sensitive data handling.

-![Figure 7: The Secure Boot flow of a trusted embedded system. Source: [@Rashmi2018Secure].](images/security_privacy/image4.png)
+![The Secure Boot flow of a trusted embedded system. Source: [@Rashmi2018Secure].](images/security_privacy/image4.png){#fig-secure-boot}

 #### Case Study: Apple's Face ID

@@ -620,9 +620,9 @@ PUFs enable all this security through their challenge-response behavior's inhere

 #### Mechanics

-The working principle behind PUFs involves generating a \"challenge-response" pair, where a specific input (the challenge) to the PUF circuit results in an output (the response) that is determined by the unique physical properties of that circuit. This process can be likened to a fingerprinting mechanism for electronic devices. Devices that utilize ML for processing sensor data can employ PUFs to secure communication between devices and prevent the execution of ML models on counterfeit hardware.
+The working principle behind PUFs (see @fig-pfu) involves generating a \"challenge-response" pair, where a specific input (the challenge) to the PUF circuit results in an output (the response) that is determined by the unique physical properties of that circuit. This process can be likened to a fingerprinting mechanism for electronic devices. Devices that utilize ML for processing sensor data can employ PUFs to secure communication between devices and prevent the execution of ML models on counterfeit hardware.

-![Figure 8: PUF basics. a) A PUF exploits intrinsic random variation at the microscale or nanoscale. Such random variation resulting from uncontrollable fabrication processes can be conceptually thought as a unique physical 'fingerprint' of a hardware device. b) Optical PUF. Under illumination from a given angle/polarization, complex interference occurs within an inhomogeneous transparent plastic token. Then, a two dimensional (2D) speckle pattern is recorded using a charge-coupled device camera. The angle/polarization is treated as the challenge while the 2D speckle pattern is the response. c) APUF. Consisting of multiple stages (1 to k), the challenge bits (C) select two theoretically identical but practically unequal delay paths at each stage. At the end of the APUF, an arbiter judges whether the top path is faster or not, and reacts with a '1' or '0' response (r; ref. 10). A challenge bit of '0' means two signals pass a given stage in parallel, whilst '1' means two signals cross over. d) SRAM PUF. The mismatch of threshold voltage Vth of the transistors determines the response124. For example, if the is slightly smaller than, at power-up, the transistor M1 starts conducting before M2, thus, the logic state at point A = '1'. This in turn prevents M2 switching on. As a result, the SRAM power-up state prefers to be '1' (point A = '1', point B = '0'), which is the response, while the address of the memory cell is the challenge. WL, word line; BL, bit line. Source: [@Gao2020Physical].](images/security_privacy/image2.png)
+![PUF basics. a) A PUF exploits intrinsic random variation at the microscale or nanoscale. Such random variation resulting from uncontrollable fabrication processes can be conceptually thought as a unique physical 'fingerprint' of a hardware device. b) Optical PUF. Under illumination from a given angle/polarization, complex interference occurs within an inhomogeneous transparent plastic token. Then, a two dimensional (2D) speckle pattern is recorded using a charge-coupled device camera. The angle/polarization is treated as the challenge while the 2D speckle pattern is the response. c) APUF. Consisting of multiple stages (1 to k), the challenge bits (C) select two theoretically identical but practically unequal delay paths at each stage. At the end of the APUF, an arbiter judges whether the top path is faster or not, and reacts with a '1' or '0' response (r; ref. 10). A challenge bit of '0' means two signals pass a given stage in parallel, whilst '1' means two signals cross over. d) SRAM PUF. The mismatch of threshold voltage Vth of the transistors determines the response124. For example, if the is slightly smaller than, at power-up, the transistor M1 starts conducting before M2, thus, the logic state at point A = '1'. This in turn prevents M2 switching on. As a result, the SRAM power-up state prefers to be '1' (point A = '1', point B = '0'), which is the response, while the address of the memory cell is the challenge. WL, word line; BL, bit line. Source: [@Gao2020Physical].](images/security_privacy/image2.png){#fig-pfu}

 #### Challenges

@@ -708,13 +708,13 @@ a. Satisfied if a dataset minimizes the amount of per-user data while its mean p

 a. Satisfied if a dataset minimizes the amount of per-user data while the minimum performance of individual user data is comparable to the minimum performance of individual user data in the original, unminimized dataset.

-Performance based data minimization can be leveraged in several machine learning settings, including recommendation algorithms of movies and in e-commerce settings. For different recommendation algorithms, global data minimization comparisons are shown in Figure 9
+Performance based data minimization can be leveraged in several machine learning settings, including recommendation algorithms of movies and in e-commerce settings. For different recommendation algorithms, global data minimization comparisons are shown in @fig-minimization.

-![Figure 9: Sorted RMSE (a, c) and NDCG (b, d) values for all users when selecting random subsets of items of varying sizes as input to the kNN (a, b) and SVD (c, d) recommendation algorithms. Higher values on the y-axis in plots (a, c) are worse, while higher values on the y-axis in plots (b, d) are better. SVD is more robust to minimization than kNN, with aggressive minimization incurring low quality loss. While error increases as we minimize, the distribution of remains the same.](images/security_privacy/image12.png)
+![Sorted RMSE (a, c) and NDCG (b, d) values for all users when selecting random subsets of items of varying sizes as input to the kNN (a, b) and SVD (c, d) recommendation algorithms. Higher values on the y-axis in plots (a, c) are worse, while higher values on the y-axis in plots (b, d) are better. SVD is more robust to minimization than kNN, with aggressive minimization incurring low quality loss. While error increases as we minimize, the distribution of remains the same.](images/security_privacy/image12.png){#fig-minimization}

-This in in comparison to per-user data minimization, in Figure 10:
+This in in comparison to per-user data minimization, in @fig-minimization2:

-![Figure 10: RMSE (a) and NDCG (b) variation over the population of users when selecting random subset of items of varying sizes as an input to the kNN algorithm. The underlying data presented here is the same as in Figure 1, but the data points are sorted by the y-axis value of the Full strategy only. Data points of other selection methods are unsorted and match the users at the ranking positions defined by the sorting of the Full strategy. This result shows that, while the overall quality loss is low and the error distribution remains the same, the quality loss for individuals can be substantial.](images/security_privacy/image11.png)
+![RMSE (a) and NDCG (b) variation over the population of users when selecting random subset of items of varying sizes as an input to the kNN algorithm. The underlying data presented here is the same as in @fig-poisoning, but the data points are sorted by the y-axis value of the Full strategy only. Data points of other selection methods are unsorted and match the users at the ranking positions defined by the sorting of the Full strategy. This result shows that, while the overall quality loss is low and the error distribution remains the same, the quality loss for individuals can be substantial.](images/security_privacy/image11.png){#fig-minimization2}

 Global data minimization seems to be a much more feasible method of data minimization compared to per-user data minimization, given the much more significant difference in per-user losses between the minimized dataset and original dataset.

@@ -732,13 +732,13 @@ With the rise of public use of generative AI models, including OpenAI's GPT4 and

 ##### Case Study

-While ChatGPT has instituted protections to prevent people from accessing private and ethically questionable information, several individuals have successfully been able to bypass these protections through prompt injection attacks, and other security attacks. As demonstrated in Figure 11, users have been able to bypass ChatGPT protections to mimic the tone of a "deceased grandmother" to learn how to bypass a web application firewall [@Gupta2023ChatGPT].
+While ChatGPT has instituted protections to prevent people from accessing private and ethically questionable information, several individuals have successfully been able to bypass these protections through prompt injection attacks, and other security attacks. As demonstrated in @fig-role-play, users have been able to bypass ChatGPT protections to mimic the tone of a "deceased grandmother" to learn how to bypass a web application firewall [@Gupta2023ChatGPT].

-![Figure 11: Grandma role play. A user is able to leverage a "role play" tactic, encouraging ChatGPT to mimic the tone of a "deceased grandmother", to bypass ChatGPT protections and learn how to bypass a web application firewall.](images/security_privacy/image6.png)
+![Grandma role play. A user is able to leverage a "role play" tactic, encouraging ChatGPT to mimic the tone of a "deceased grandmother", to bypass ChatGPT protections and learn how to bypass a web application firewall.](images/security_privacy/image6.png){#fig-role-play}

-Further, users have also successfully been able to use reverse psychology to manipulate ChatGPT and access information initially prohibited by the model. In Figure 12, a user is initially prevented from learning about piracy websites through ChatGPT, but is easily able to bypass these restrictions using reverse psychology.
+Further, users have also successfully been able to use reverse psychology to manipulate ChatGPT and access information initially prohibited by the model. In @fig-role-play2, a user is initially prevented from learning about piracy websites through ChatGPT, but is easily able to bypass these restrictions using reverse psychology.

-![Figure 12: By using reverse psychology to bypass ChatGPT safety restrictions, a use is easily able to access prohibited information regarding piracy websites.](images/security_privacy/image10.png)
+![By using reverse psychology to bypass ChatGPT safety restrictions, a use is easily able to access prohibited information regarding piracy websites.](images/security_privacy/image10.png){#fig-role-play2}

 The ease at which ChatGPT can be manipulated by security attacks is concerning given the private information it was trained upon without consent. Further research on data privacy in LLMs and generative AI should focus on preventing the model from being so naive to prompt injection attacks.

@@ -770,9 +770,9 @@ The Laplace Mechanism is one of the most straightforward and commonly used metho

 While the Laplace distribution is common, other distributions like Gaussian can also be used. Laplace noise is used for strict ε-differential privacy for low-sensitivity queries while Gaussian distributions can be used when privacy does not need to be guaranteed, which is known as (ϵ, 𝛿)-differential privacy. In this relaxed version of differential privacy, epsilon and delta are parameters that define the amount of privacy guarantee when releasing information or a model related to a dataset. Epsilon sets a bound on how much information can be learned about the data based on the output while delta allows for a small probability of the privacy guarantee to be violated. The choice between Laplace, Gaussian, and other distributions will depend on the specific requirements of the query and the dataset and the trade-off between privacy and accuracy.

-To illustrate the trade-off of privacy and accuracy in (ϵ, 𝛿)-differential privacy, the following graphs in Figure 13 show the results on accuracy for different noise levels on the MNIST dataset, a large dataset of handwritten digits [@abadi2016deep]. An increasing delta value relaxes the privacy guarantee, so the noise level can be reduced. Since the data will retain many of its original characteristics, accuracy simultaneously increases with drawbacks on privacy preservation. This trade-off is fundamental to differential privacy.
+To illustrate the trade-off of privacy and accuracy in ($\epsilon$, $\delta$)-differential privacy, the following graphs in @fig-tradeoffs show the results on accuracy for different noise levels on the MNIST dataset, a large dataset of handwritten digits [@abadi2016deep]. An increasing delta value relaxes the privacy guarantee, so the noise level can be reduced. Since the data will retain many of its original characteristics, accuracy simultaneously increases with drawbacks on privacy preservation. This trade-off is fundamental to differential privacy.

-![Figure 13: Tradeoffs between privacy (as represented by 𝛿) and accuracy throughout the training process for models trained on data containing low, medium, and high levels of noise.](images/security_privacy/image8.png)
+![Tradeoffs between privacy (as represented by 𝛿) and accuracy throughout the training process for models trained on data containing low, medium, and high levels of noise.](images/security_privacy/image8.png){#fig-tradeoffs}

 The key points to remember about differential privacy is the following:

@@ -823,9 +823,9 @@ Multiple third-party audits have verified that Apple's system provides rigorous

 Federated Learning (FL) is a type of machine learning where the process of building a model is distributed across multiple devices or servers, while keeping the training data localized. It was previously discussed in the [Model Optimizations](./optimizations.qmd) chapter, but we will recap it here briefly for the purposes of completion and focus on things that pertain to this chapter.

-FL aims to train machine learning models across decentralized networks of devices or systems while keeping all training data localized. In FL, each participating device leverages its local data to calculate model updates which are then aggregated to build an improved global model. However, the raw training data itself is never directly shared, transferred, or compiled together. This privacy-preserving approach allows jointly developing ML models without centralizing the potentially sensitive training data in one place.
+FL aims to train machine learning models across decentralized networks of devices or systems while keeping all training data localized (see @fig-fl-lifecycle). In FL, each participating device leverages its local data to calculate model updates which are then aggregated to build an improved global model. However, the raw training data itself is never directly shared, transferred, or compiled together. This privacy-preserving approach allows jointly developing ML models without centralizing the potentially sensitive training data in one place.

-![Figure 14: The FL lifecycle from [@MAL-083]. The training data always remains on the client data, the model repeatedly is sent back and forth between individual devices and server for local updates and compiling the global model, respectively.](images/security_privacy/image7.png)
+![The FL lifecycle from [@MAL-083]. The training data always remains on the client data, the model repeatedly is sent back and forth between individual devices and server for local updates and compiling the global model, respectively.](images/security_privacy/image7.png){#fig-fl-lifecycle}

 One of the most common model aggregation algorithms is Federated Averaging (FedAvg) where the global model is created by averaging all of the parameters from local parameters. While FedAvg works well with independent and identically distributed data (IID), alternate algorithms like Federated Proximal (FedProx) are crucial in real-world applications where data is often non-IID. FedProx is designed for the FL process when there is significant heterogeneity in the client updates due to diverse data distributions across devices, computational capabilities, or varied amounts of data.

@@ -883,9 +883,9 @@ Machine unlearning is a fairly new process, describing the methods in which the

 #### Case Study

-Some researchers demonstrate a real life example of machine unlearning approaches applied to SOTA machine learning models through training an LLM, LLaMA2-7b, to unlearn any references to Harry Potter [@eldan2023whos]. Though this model took 184K GPU-hours to pretrain, it only took 1 GPU hour of fine tuning to erase the model's ability to generate or recall Harry Potter-related content, without noticeably compromising the accuracy of generating content unrelated to Harry Potter. Table 1 demonstrates how the model output changes before and after unlearning has occurred.
+Some researchers demonstrate a real life example of machine unlearning approaches applied to SOTA machine learning models through training an LLM, LLaMA2-7b, to unlearn any references to Harry Potter [@eldan2023whos]. Though this model took 184K GPU-hours to pretrain, it only took 1 GPU hour of fine tuning to erase the model's ability to generate or recall Harry Potter-related content, without noticeably compromising the accuracy of generating content unrelated to Harry Potter. @fig-hp-prompts demonstrates how the model output changes before and after unlearning has occurred.

-![Table 1: Examples of Harry Potter related prompts, and resulting model outputs from the original LLaMA-7b and the fine-tuned LLaMA-7b that has unlearned references to Harry Potter. The original Llama-7b outputs contain information specific to Harry Potter, while the finetuned LLaMA-7b outputs generic information that does not demonstrate knowledge of Harry Potter.](images/security_privacy/image13.png)
+![Examples of Harry Potter related prompts, and resulting model outputs from the original LLaMA-7b and the fine-tuned LLaMA-7b that has unlearned references to Harry Potter. The original Llama-7b outputs contain information specific to Harry Potter, while the finetuned LLaMA-7b outputs generic information that does not demonstrate knowledge of Harry Potter.](images/security_privacy/image13.png){#fig-hp-prompts}

 #### Other Uses

@@ -907,7 +907,7 @@ To use homomorphic encryption across different entities, carefully generated pub

 Homomorphic encryption enables machine learning model training and inference on encrypted data, ensuring that sensitive inputs and intermediate values remain confidential. This is critical in healthcare, finance, genetics, and other domains increasingly relying on ML to analyze sensitive and regulated data sets containing billions of personal records.

-Homomorphic encryption thwarts attacks like model extraction and membership inference that could expose private data used in ML workflows. It provides an alternative to trusted execution environments using hardware enclaves for confidential computing. However, current schemes have high computational overheads and algorithmic limitations that constrain real-world applications.
+Homomorphic encryption thwarts attacks like model extraction and membership inference that could expose private data used in ML workflows. It provides an alternative to TEEs using hardware enclaves for confidential computing. However, current schemes have high computational overheads and algorithmic limitations that constrain real-world applications.

 Homomorphic encryption realizes the decades-old vision of secure multiparty computation by allowing computation on ciphertexts. After being conceptualized in the 1970s, the first fully homomorphic crypto systems emerged in 2009, enabling arbitrary computations. Ongoing research is making these techniques more efficient and practical.

@@ -937,7 +937,7 @@ For many real-time and embedded applications, fully homomorphic encryption remai

 **Algorithmic Limitations:** Current schemes restrict the functions and depth of computations supported, limiting the models and data volumes that can be processed. Ongoing research is pushing these boundaries but restrictions remain.

-**Hardware Acceleration:** To be feasible, homomorphic encryption requires specialized hardware like secure processors or co-processors with trusted execution environments. This adds design and infrastructure costs.
+**Hardware Acceleration:** To be feasible, homomorphic encryption requires specialized hardware like secure processors or co-processors with TEEs. This adds design and infrastructure costs.

 **Hybrid Designs:** Rather than encrypting entire workflows, selective application of homomorphic encryption to critical subcomponents can achieve protection while minimizing overheads.

@@ -1007,9 +1007,9 @@ The primary challenge of synthesizing data is to ensure adversaries are unable t

 Researchers can freely share this synthetic data and collaborate on modeling without revealing any private medical information. Well-constructed synthetic data protects privacy while providing utility for developing accurate models. Key techniques to prevent reconstruction of the original data include adding differential privacy noise during training, enforcing plausibility constraints, and using multiple diverse generative models. Here are some common approaches for generating synthetic data:

- **Generative Adversarial Networks (GANs):** GANs (see Figure 15) are a type of AI algorithm used in unsupervised learning where two neural networks contest against each other in a game. The generator network is responsible for producing the synthetic data and the discriminator network evaluates the authenticity of the data by distinguishing between fake data created by the generator network and the real data. The discriminator acts as a metric on how similar the fake and real data are to one another. It is highly effective at generating realistic data and is, therefore, a popular approach for generating synthetic data.
+- **Generative Adversarial Networks (GANs):** GANs (see @fig-gans) are a type of AI algorithm used in unsupervised learning where two neural networks contest against each other in a game. The generator network is responsible for producing the synthetic data and the discriminator network evaluates the authenticity of the data by distinguishing between fake data created by the generator network and the real data. The discriminator acts as a metric on how similar the fake and real data are to one another. It is highly effective at generating realistic data and is, therefore, a popular approach for generating synthetic data.

-![Figure 15: Flowchart of GANs, demonstrating how a generator synthesizes fake data to send as an input to the discriminator, which distinguishes between the fake and real data in order to evaluate the authenticity of the data.](images/security_privacy/image9.png)
+![Flowchart of GANs, demonstrating how a generator synthesizes fake data to send as an input to the discriminator, which distinguishes between the fake and real data in order to evaluate the authenticity of the data.](images/security_privacy/image9.png){#fig-gans}

 - **Variational Autoencoders (VAEs):** VAEs are neural networks that are capable of learning complex probability distributions and balance between data generation quality and computational efficiency. They encode data into a latent space where they learn the distribution in order to decode the data back.

@@ -1055,7 +1055,7 @@ While all the techniques we have discussed thus far aim to enable privacy-preser

 ## Conclusion

-Machine learning hardware security is a critical concern as embedded ML systems are increasingly deployed in safety-critical domains like medical devices, industrial controls, and autonomous vehicles. We have explored various threats spanning hardware bugs, physical attacks, side channels, supply chain risks and more. Defenses like trusted execution environments, secure boot, PUFs, and hardware security modules provide multilayer protection tailored for resource-constrained embedded devices. 
+Machine learning hardware security is a critical concern as embedded ML systems are increasingly deployed in safety-critical domains like medical devices, industrial controls, and autonomous vehicles. We have explored various threats spanning hardware bugs, physical attacks, side channels, supply chain risks and more. Defenses like TEEs, secure boot, PUFs, and hardware security modules provide multilayer protection tailored for resource-constrained embedded devices. 

 However, continual vigilance is essential to track emerging attack vectors and address potential vulnerabilities through secure engineering practices across the hardware lifecycle. As ML and embedded ML spreads, maintaining rigorous security foundations that match the field's accelerating pace of innovation remains imperative.