mirror of
https://github.com/mohitmishra786/amILearningEnough.git
synced 2026-03-11 17:34:16 -05:00
Create bpftrace-eBPF-tools.md
This commit is contained in:
378
src/resources/linux/bpftrace-eBPF-tools.md
Normal file
378
src/resources/linux/bpftrace-eBPF-tools.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# BPFtrace and eBPF Tools Guide
|
||||
|
||||
## Table of Contents
|
||||
1. [Introduction](#introduction)
|
||||
2. [System Layer Overview](#system-layer-overview)
|
||||
3. [Tool Categories](#tool-categories)
|
||||
4. [Detailed Tool Analysis](#detailed-tool-analysis)
|
||||
5. [Command Reference](#command-reference)
|
||||
|
||||
## Introduction
|
||||
|
||||
This guide covers the comprehensive set of bpftrace and eBPF tools available for Linux system analysis and performance monitoring across different layers of the system stack.
|
||||
|
||||
## System Layer Overview
|
||||
|
||||
The tools are organized across these main layers:
|
||||
1. Applications & Runtimes
|
||||
2. System Libraries
|
||||
3. System Call Interface
|
||||
4. Kernel Subsystems:
|
||||
- VFS (Virtual File System)
|
||||
- Network Stack (Sockets, TCP/UDP, IP)
|
||||
- Scheduler
|
||||
- Virtual Memory
|
||||
- Device Drivers
|
||||
|
||||
## Tool Categories
|
||||
|
||||
### Application Level Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| opensnoop | Trace file opens | Application |
|
||||
| statsnoop | Trace stat() syscalls | Application |
|
||||
| syncsnoop | Trace sync operations | Application |
|
||||
| bashreadline | Trace bash commands | Application |
|
||||
| gethostlatency | DNS latency analysis | System Libraries |
|
||||
|
||||
### System Call Interface Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| syscount | Count syscalls | System Call |
|
||||
| execsnoop | Trace new processes | System Call |
|
||||
| killsnoop | Trace kill() syscalls | System Call |
|
||||
| pidpersec | New processes per second | System Call |
|
||||
|
||||
### File System Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| vfscount | VFS operation counts | VFS |
|
||||
| vfsstat | VFS operation stats | VFS |
|
||||
| writeback | Trace file writeback | File Systems |
|
||||
| xfsdist | XFS operation latency | File Systems |
|
||||
| mdflush | Trace md RAID flush events | Volume Manager |
|
||||
|
||||
### Block Device Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| biosnoop | Trace block I/O | Block Device |
|
||||
| biolatency | Block I/O latency | Block Device |
|
||||
| bitesize | Block I/O size analysis | Block Device |
|
||||
|
||||
### Network Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| tcpconnect | Trace TCP connections | TCP/UDP |
|
||||
| tcpaccept | Trace TCP accepts | TCP/UDP |
|
||||
| tcpretrans | Trace TCP retransmits | TCP/UDP |
|
||||
| tcpdrop | Trace TCP drops | TCP/UDP |
|
||||
|
||||
### CPU/Scheduler Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| cpuwalk | CPU instruction analysis | Scheduler |
|
||||
| runqlat | Run queue latency | Scheduler |
|
||||
| runqlen | Run queue length | Scheduler |
|
||||
| offcputime | Off-CPU analysis | Scheduler |
|
||||
|
||||
### Memory Management Tools
|
||||
| Tool | Purpose | Layer |
|
||||
|------|---------|-------|
|
||||
| oomkill | Trace OOM killer | Virtual Memory |
|
||||
| capable | Trace capability checks | System |
|
||||
|
||||
## Detailed Tool Analysis
|
||||
|
||||
### Application Monitoring Tools
|
||||
|
||||
#### opensnoop
|
||||
```bash
|
||||
# Trace all file opens
|
||||
opensnoop
|
||||
|
||||
# Trace specific process
|
||||
opensnoop -p 1234
|
||||
|
||||
# Include stack traces
|
||||
opensnoop --stack
|
||||
|
||||
# Filter by file name
|
||||
opensnoop -n "*.txt"
|
||||
```
|
||||
|
||||
#### statsnoop
|
||||
```bash
|
||||
# Trace all stat() calls
|
||||
statsnoop
|
||||
|
||||
# Show failed stats only
|
||||
statsnoop -x
|
||||
|
||||
# Filter by process name
|
||||
statsnoop -n "nginx"
|
||||
|
||||
# Include extended details
|
||||
statsnoop -v
|
||||
```
|
||||
|
||||
#### bashreadline
|
||||
```bash
|
||||
# Trace all bash commands
|
||||
bashreadline
|
||||
|
||||
# Include timestamps
|
||||
bashreadline -t
|
||||
|
||||
# Trace specific shell PID
|
||||
bashreadline -p 1234
|
||||
```
|
||||
|
||||
### Network Analysis Tools
|
||||
|
||||
#### tcpconnect
|
||||
```bash
|
||||
# Trace all TCP connections
|
||||
tcpconnect
|
||||
|
||||
# Show port numbers
|
||||
tcpconnect -p
|
||||
|
||||
# Include timestamps
|
||||
tcpconnect -t
|
||||
|
||||
# Filter by port
|
||||
tcpconnect -P 80
|
||||
```
|
||||
|
||||
#### tcpretrans
|
||||
```bash
|
||||
# Trace TCP retransmissions
|
||||
tcpretrans
|
||||
|
||||
# Include TCP state
|
||||
tcpretrans -s
|
||||
|
||||
# Show stack traces
|
||||
tcpretrans --stack
|
||||
|
||||
# Filter by IP
|
||||
tcpretrans -i 192.168.1.1
|
||||
```
|
||||
|
||||
### File System Analysis
|
||||
|
||||
#### vfscount
|
||||
```bash
|
||||
# Count VFS operations
|
||||
vfscount
|
||||
|
||||
# Group by operation type
|
||||
vfscount -g
|
||||
|
||||
# Include stack traces
|
||||
vfscount --stack
|
||||
```
|
||||
|
||||
#### writeback
|
||||
```bash
|
||||
# Trace file writeback
|
||||
writeback
|
||||
|
||||
# Show per-device stats
|
||||
writeback -d
|
||||
|
||||
# Include process info
|
||||
writeback -p
|
||||
```
|
||||
|
||||
### Block Device Analysis
|
||||
|
||||
#### biosnoop
|
||||
```bash
|
||||
# Trace block I/O
|
||||
biosnoop
|
||||
|
||||
# Show queued time
|
||||
biosnoop -q
|
||||
|
||||
# Filter by device
|
||||
biosnoop -d sda
|
||||
|
||||
# Include process info
|
||||
biosnoop -p
|
||||
```
|
||||
|
||||
#### biolatency
|
||||
```bash
|
||||
# Show block I/O latency
|
||||
biolatency
|
||||
|
||||
# Use microsecond units
|
||||
biolatency -u
|
||||
|
||||
# Create histogram
|
||||
biolatency -h
|
||||
|
||||
# Filter by device
|
||||
biolatency -d sda
|
||||
```
|
||||
|
||||
### CPU and Scheduler Analysis
|
||||
|
||||
#### runqlat
|
||||
```bash
|
||||
# Show run queue latency
|
||||
runqlat
|
||||
|
||||
# Use microsecond units
|
||||
runqlat -u
|
||||
|
||||
# Filter by CPU
|
||||
runqlat -c 0
|
||||
|
||||
# Create histogram
|
||||
runqlat --hist
|
||||
```
|
||||
|
||||
#### offcputime
|
||||
```bash
|
||||
# Trace off-CPU time
|
||||
offcputime
|
||||
|
||||
# Filter by process
|
||||
offcputime -p 1234
|
||||
|
||||
# Set duration
|
||||
offcputime -d 10
|
||||
|
||||
# Include user stacks
|
||||
offcputime -u
|
||||
```
|
||||
|
||||
## Command Reference
|
||||
|
||||
### General Options
|
||||
Most bpftrace tools support these common options:
|
||||
```bash
|
||||
-h # Show help message
|
||||
-v # Verbose output
|
||||
-d # Debug output
|
||||
-p PID # Filter by process ID
|
||||
-t # Include timestamps
|
||||
--stack # Show stack traces
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
#### Custom Scripts
|
||||
```bash
|
||||
# Create custom bpftrace script
|
||||
cat > custom.bt << 'EOF'
|
||||
#!/usr/bin/bpftrace
|
||||
tracepoint:syscalls:sys_enter_open
|
||||
{
|
||||
printf("%s opened %s\n", comm, str(args->filename));
|
||||
}
|
||||
EOF
|
||||
|
||||
# Run custom script
|
||||
bpftrace custom.bt
|
||||
```
|
||||
|
||||
#### Performance Monitoring
|
||||
```bash
|
||||
# Monitor system calls
|
||||
syscount -i 1
|
||||
|
||||
# Monitor process creation
|
||||
pidpersec -i 5
|
||||
|
||||
# Track OOM kills
|
||||
oomkill -t
|
||||
```
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Resource Usage**
|
||||
- Be cautious with stack traces in production
|
||||
- Use sampling for high-frequency events
|
||||
- Monitor overhead with top/htop
|
||||
|
||||
2. **Filtering**
|
||||
- Use specific filters to reduce overhead
|
||||
- Combine multiple conditions when possible
|
||||
- Consider using time-based filters
|
||||
|
||||
3. **Output Control**
|
||||
- Use appropriate output formats
|
||||
- Consider logging to files for analysis
|
||||
- Use aggregation for high-volume data
|
||||
|
||||
4. **Troubleshooting**
|
||||
- Start with broad tools
|
||||
- Narrow down to specific events
|
||||
- Use multiple tools for correlation
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Overhead Management
|
||||
```bash
|
||||
# Reduce overhead with sampling
|
||||
biolatency --sample-rate 10
|
||||
|
||||
# Use efficient filters
|
||||
opensnoop -n '*.log'
|
||||
|
||||
# Limit stack traces
|
||||
tcpconnect --stack --stack-storage-size 1024
|
||||
```
|
||||
|
||||
### Production Usage
|
||||
1. Test tools in development first
|
||||
2. Use appropriate filtering
|
||||
3. Monitor system impact
|
||||
4. Set appropriate buffer sizes
|
||||
5. Use time-based execution limits
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### Performance Analysis
|
||||
```bash
|
||||
# Analyze disk I/O
|
||||
biolatency -h
|
||||
biosnoop -p
|
||||
|
||||
# Network performance
|
||||
tcpretrans -s
|
||||
tcpconnect -t
|
||||
|
||||
# CPU scheduling
|
||||
runqlat --hist
|
||||
offcputime -p 1234
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
```bash
|
||||
# File system issues
|
||||
opensnoop -t
|
||||
vfscount
|
||||
|
||||
# Network problems
|
||||
tcpdrop
|
||||
tcpretrans
|
||||
|
||||
# Memory issues
|
||||
oomkill -t
|
||||
```
|
||||
|
||||
### Security Monitoring
|
||||
```bash
|
||||
# Track capability checks
|
||||
capable -v
|
||||
|
||||
# Monitor process creation
|
||||
execsnoop -t
|
||||
|
||||
# Track file access
|
||||
opensnoop -t
|
||||
```
|
||||
Reference in New Issue
Block a user