mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-03-11 17:49:25 -05:00
Food for thought, sketching out the data engineering section
This commit is contained in:
@@ -1,17 +1,17 @@
|
||||
# Data Engineering
|
||||
|
||||
<!--
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
[//]: # Explanation: This section establishes the groundwork, defining data engineering and explaining its importance and role in Embedded AI. A well-rounded introduction will help in establishing the foundation for the readers.
|
||||
Explanation: This section establishes the groundwork, defining data engineering and explaining its importance and role in Embedded AI. A well-rounded introduction will help in establishing the foundation for the readers.
|
||||
|
||||
- Definition and Importance of Data Engineering in AI
|
||||
- Role of Data Engineering in Embedded AI
|
||||
- Synergy with Machine Learning and Deep Learning
|
||||
|
||||
## Problem Definition
|
||||
## Problem
|
||||
|
||||
Explanation: This section is a crucial starting point in any data engineering project, as it lays the groundwork for the project's trajectory and ultimate success. Here's a brief explanation of why each subsection within the "Problem Definition" is important:
|
||||
|
||||
- Identifying the Problem
|
||||
- Setting Clear Objectives
|
||||
- Benchmarks for Success
|
||||
@@ -19,7 +19,7 @@
|
||||
|
||||
## Data Sourcing
|
||||
|
||||
[//]: # Explanation: This section delves into the first step in data engineering - gathering data. Understanding various data types and sources is vital for developing robust AI systems, especially in the context of embedded systems where resources might be limited.
|
||||
Explanation: This section delves into the first step in data engineering - gathering data. Understanding various data types and sources is vital for developing robust AI systems, especially in the context of embedded systems where resources might be limited.
|
||||
|
||||
- Data Sources
|
||||
- Data Types: Structured, Semi-Structured, and Unstructured
|
||||
@@ -27,7 +27,7 @@
|
||||
|
||||
## Data Storage and Management
|
||||
|
||||
[//]: # Explanation: Data must be stored and managed efficiently to facilitate easy access and processing. This section would provide insights into different data storage options and their respective advantages and challenges in embedded systems.
|
||||
Explanation: Data must be stored and managed efficiently to facilitate easy access and processing. This section would provide insights into different data storage options and their respective advantages and challenges in embedded systems.
|
||||
|
||||
- Database Selection: SQL vs NoSQL
|
||||
- Data Warehousing
|
||||
@@ -36,7 +36,7 @@
|
||||
|
||||
## Data Processing
|
||||
|
||||
[//]: # Explanation: Data processing is a pivotal step in transforming raw data into a usable format. This section provides a deep dive into the necessary processes, including cleaning, integration, and establishing data pipelines, all crucial for streamlining operations in embedded AI systems.
|
||||
Explanation: Data processing is a pivotal step in transforming raw data into a usable format. This section provides a deep dive into the necessary processes, including cleaning, integration, and establishing data pipelines, all crucial for streamlining operations in embedded AI systems.
|
||||
|
||||
- Data Cleaning and Transformation
|
||||
- Data Integration
|
||||
@@ -45,7 +45,7 @@
|
||||
|
||||
## Data Quality
|
||||
|
||||
[//]: # Explanation: Ensuring data quality is critical to developing reliable AI models. This section outlines various strategies to maintain and assess data quality.
|
||||
Explanation: Ensuring data quality is critical to developing reliable AI models. This section outlines various strategies to maintain and assess data quality.
|
||||
|
||||
- Data Validation
|
||||
- Handling Missing Values
|
||||
@@ -53,7 +53,7 @@
|
||||
|
||||
## Feature Engineering
|
||||
|
||||
[//]: # Explanation: Feature engineering involves selecting and transforming variables to improve the performance of AI models. It's vital in embedded AI systems where computational resources are limited, and optimized feature sets can significantly improve performance.
|
||||
Explanation: Feature engineering involves selecting and transforming variables to improve the performance of AI models. It's vital in embedded AI systems where computational resources are limited, and optimized feature sets can significantly improve performance.
|
||||
|
||||
- Importance of Feature Engineering
|
||||
- Techniques of Feature Selection
|
||||
@@ -62,7 +62,7 @@
|
||||
|
||||
## Data Labeling
|
||||
|
||||
[//]: # Explanation: Labeling is an essential part of preparing data for supervised learning. This section focuses on various strategies and tools available for data labeling, a vital process in the data preparation phase.
|
||||
Explanation: Labeling is an essential part of preparing data for supervised learning. This section focuses on various strategies and tools available for data labeling, a vital process in the data preparation phase.
|
||||
|
||||
- Manual Data Labeling
|
||||
- Automated Data Labeling
|
||||
@@ -70,21 +70,21 @@
|
||||
|
||||
## Data Version Control
|
||||
|
||||
[//]: # Explanation: Version control is critical for managing changes and tracking versions of datasets during the development of AI models, facilitating reproducibility and collaboration.
|
||||
Explanation: Version control is critical for managing changes and tracking versions of datasets during the development of AI models, facilitating reproducibility and collaboration.
|
||||
|
||||
- Version Control Systems
|
||||
- Data Versioning in ML Projects
|
||||
|
||||
## Optimizing Data for Embedded AI
|
||||
|
||||
[//]: # Explanation: This section concentrates on optimization techniques specifically suited for embedded systems, focusing on strategies to reduce data volume and enhance storage and retrieval efficiency, crucial for resource-constrained embedded environments.
|
||||
Explanation: This section concentrates on optimization techniques specifically suited for embedded systems, focusing on strategies to reduce data volume and enhance storage and retrieval efficiency, crucial for resource-constrained embedded environments.
|
||||
|
||||
- Data Reduction Techniques
|
||||
- Optimizing Data Storage and Retrieval
|
||||
|
||||
## Challenges in Data Engineering
|
||||
|
||||
[//]: # Explanation: Understanding potential challenges can help in devising strategies to mitigate them. This section discusses common challenges encountered in data engineering, particularly focusing on embedded systems.
|
||||
Explanation: Understanding potential challenges can help in devising strategies to mitigate them. This section discusses common challenges encountered in data engineering, particularly focusing on embedded systems.
|
||||
|
||||
- Scalability
|
||||
- Data Security and Privacy
|
||||
@@ -93,5 +93,3 @@
|
||||
## Conclusion
|
||||
- The Future of Data Engineering in Embedded AI
|
||||
- Key Takeaways
|
||||
|
||||
-->
|
||||
Reference in New Issue
Block a user