API Documentation Examples¶
This notebook demonstrates the API modules documented with different docstring styles. It provides interactive examples of using the DataProcessor classes from each style implementation.
Overview¶
We'll explore three identical implementations of a data processing pipeline, each documented with a different docstring style:
- Google Style - Clean, readable format popular in open-source projects
- NumPy Style - Structured format ideal for scientific computing
- Sphinx Style - Traditional format with rich reStructuredText markup
All three implementations provide the same functionality, allowing direct comparison of documentation styles.
Setup and Imports¶
First, let's import all three DataProcessor implementations:
# Import all three docstring style implementations
import os
import sys
# Add the src directory to Python path for imports
sys.path.insert(0, os.path.join(os.getcwd(), "..", "..", "src"))
# Import supporting modules for examples
import json
import pprint
import tempfile
from pathlib import Path
from docstring_examples.google_style import DataProcessor as GoogleDataProcessor
from docstring_examples.numpy_style import DataProcessor as NumpyDataProcessor
from docstring_examples.sphinx_style import DataProcessor as SphinxDataProcessor
print("✅ Successfully imported all DataProcessor implementations")
print(f"📍 Working directory: {os.getcwd()}")
Sample Data¶
Let's create some sample data for our demonstrations:
# Create sample datasets for different scenarios
# Dataset 1: E-commerce sales data
sales_data = [
{
"product_id": "WIDGET-001",
"product_name": "premium widget",
"sales": 1250.00,
"region": "north america",
"category": "electronics",
"active": True,
},
{
"product_id": "GADGET-002",
"product_name": "ultra gadget",
"sales": 875.50,
"region": "europe",
"category": "electronics",
"active": True,
},
{
"product_id": "TOOL-003",
"product_name": "basic tool",
"sales": 425.75,
"region": "asia pacific",
"category": "tools",
"active": False,
},
{
"product_id": "DEVICE-004",
"product_name": "smart device",
"sales": 1680.25,
"region": "north america",
"category": "electronics",
"active": True,
},
]
# Dataset 2: Research experiment data
experiment_data = [
{
"experiment_id": "EXP-2024-001",
"temperature": 23.5,
"pressure": 101.3,
"humidity": 45.2,
"result": "success",
"duration_minutes": 120,
},
{
"experiment_id": "EXP-2024-002",
"temperature": 25.1,
"pressure": 99.8,
"humidity": 52.7,
"result": "partial",
"duration_minutes": 95,
},
{
"experiment_id": "EXP-2024-003",
"temperature": 22.8,
"pressure": 102.1,
"humidity": 38.9,
"result": "failure",
"duration_minutes": 75,
},
]
print("📊 Sample datasets created:")
print(f" 💰 Sales data: {len(sales_data)} records")
print(f" 🔬 Experiment data: {len(experiment_data)} records")
# Display first record from each dataset
print("\n📋 Sample records:")
print("Sales:", sales_data[0])
print("Experiment:", experiment_data[0])
Google-Style DataProcessor Demo¶
Let's demonstrate the Google-style documented DataProcessor with e-commerce data processing:
print("🎯 Google-Style DataProcessor Demo")
print("=" * 50)
# Create processor with Google-style documentation
google_processor = GoogleDataProcessor(
name="sales_analytics", validation_enabled=True, max_transformations=10
)
print(f"✅ Created processor: {google_processor}")
# Load the sales data
google_processor.load_data(sales_data)
print(f"📥 Loaded {len(sales_data)} sales records")
# Apply transformations to clean and standardize the data
def standardize_text(value):
"""Convert text to title case and clean up formatting."""
if isinstance(value, str):
return value.strip().title()
return value
transform_result = google_processor.transform_data(standardize_text)
print(f"🔄 Transformation result: {transform_result}")
# Filter for high-performing active products
filter_result = google_processor.apply_filter(
lambda record: record.get("sales", 0) > 800 and record.get("active", False)
)
print(f"🔍 Filter result: {filter_result}")
# Get comprehensive statistics
stats = google_processor.get_statistics()
print("\n📊 Google Processor Statistics:")
pprint.pprint(stats, width=80)
# Export processed data to temporary file
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as tmp_file:
google_processor.export_data(tmp_file.name)
temp_file_path = tmp_file.name
print(f"💾 Exported data to: {temp_file_path}")
# Read and display the exported data
with open(temp_file_path, "r") as f:
exported_data = json.load(f)
print("\n📄 Exported data preview:")
for record in exported_data["records"]:
print(f" - {record['Product_Name']}: ${record['sales']:,.2f} ({record['Region']})")
# Clean up temporary file
os.unlink(temp_file_path)
NumPy-Style DataProcessor Demo¶
Now let's demonstrate the NumPy-style documented DataProcessor with scientific experiment data:
print("🔬 NumPy-Style DataProcessor Demo")
print("=" * 50)
# Create processor with NumPy-style documentation
numpy_processor = NumpyDataProcessor(
name="experiment_analysis", validation_enabled=True, max_transformations=15
)
print(f"✅ Created processor: {numpy_processor}")
# Load the experiment data
numpy_processor.load_data(experiment_data)
print(f"📥 Loaded {len(experiment_data)} experiment records")
# Apply scientific transformations
def normalize_measurements(value):
"""Normalize temperature and pressure measurements."""
if isinstance(value, (int, float)):
# Round to 2 decimal places for consistency
if 20 <= value <= 30: # Temperature range
return round(value, 2)
elif 95 <= value <= 105: # Pressure range
return round(value, 1)
elif 30 <= value <= 60: # Humidity range
return round(value, 1)
else:
return round(value, 2)
return value
transform_result = numpy_processor.transform_data(normalize_measurements)
print(f"🔄 Transformation result: {transform_result}")
# Filter for successful experiments with optimal conditions
def optimal_conditions_filter(record):
"""Filter for experiments with optimal environmental conditions."""
temp_ok = 22.0 <= record.get("temperature", 0) <= 26.0
pressure_ok = 99.0 <= record.get("pressure", 0) <= 103.0
humidity_ok = 40.0 <= record.get("humidity", 0) <= 55.0
return temp_ok and pressure_ok and humidity_ok
filter_result = numpy_processor.apply_filter(optimal_conditions_filter)
print(f"🔍 Filter result: {filter_result}")
# Additional filter for duration analysis
duration_filter = numpy_processor.apply_filter(
lambda record: record.get("duration_minutes", 0) >= 90
)
print(f"⏱️ Duration filter result: {duration_filter}")
# Get comprehensive statistics
stats = numpy_processor.get_statistics()
print("\n📊 NumPy Processor Statistics:")
pprint.pprint(stats, width=80)
# Export processed experiment data
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as tmp_file:
numpy_processor.export_data(tmp_file.name)
temp_file_path = tmp_file.name
print(f"💾 Exported data to: {temp_file_path}")
# Read and display the exported data
with open(temp_file_path, "r") as f:
exported_data = json.load(f)
print("\n📄 Exported experiment data:")
for record in exported_data["records"]:
print(
f" - {record['experiment_id']}: T={record['temperature']}°C, "
f"P={record['pressure']}kPa, H={record['humidity']}% "
f"({record['duration_minutes']}min - {record['result']})"
)
# Clean up temporary file
os.unlink(temp_file_path)
Sphinx-Style DataProcessor Demo¶
Finally, let's demonstrate the Sphinx-style documented DataProcessor with document processing:
print("📚 Sphinx-Style DataProcessor Demo")
print("=" * 50)
# Create document metadata for processing
document_data = [
{
"doc_id": "DOC-2024-001",
"title": "machine learning fundamentals",
"author": "dr. sarah chen",
"pages": 156,
"language": "english",
"category": "technical",
"citations": 45,
"status": "published",
},
{
"doc_id": "DOC-2024-002",
"title": "data visualization techniques",
"author": "prof. james wilson",
"pages": 89,
"language": "english",
"category": "tutorial",
"citations": 23,
"status": "draft",
},
{
"doc_id": "DOC-2024-003",
"title": "statistical analysis methods",
"author": "dr. maria rodriguez",
"pages": 203,
"language": "english",
"category": "reference",
"citations": 67,
"status": "published",
},
]
# Create processor with Sphinx-style documentation
sphinx_processor = SphinxDataProcessor(
name="document_processor", validation_enabled=True, max_transformations=20
)
print(f"✅ Created processor: {sphinx_processor}")
# Load the document data
sphinx_processor.load_data(document_data)
print(f"📥 Loaded {len(document_data)} document records")
# Apply document transformations
def format_academic_text(value):
"""Format academic text with proper capitalization and formatting."""
if isinstance(value, str):
# Handle special academic formatting
formatted = value.strip().title()
# Fix common academic titles
formatted = formatted.replace("Dr.", "Dr.")
formatted = formatted.replace("Prof.", "Prof.")
return formatted
return value
transform_result = sphinx_processor.transform_data(format_academic_text)
print(f"🔄 Transformation result: {transform_result}")
# Filter for substantial published documents
substantial_docs_filter = sphinx_processor.apply_filter(
lambda doc: (
doc.get("pages", 0) > 100
and doc.get("status") == "Published"
and doc.get("citations", 0) > 30
)
)
print(f"🔍 Substantial documents filter: {substantial_docs_filter}")
# Additional filter for technical category
technical_filter = sphinx_processor.apply_filter(
lambda doc: doc.get("category", "").lower() in ["technical", "reference"]
)
print(f"📖 Technical documents filter: {technical_filter}")
# Get comprehensive statistics
stats = sphinx_processor.get_statistics()
print("\n📊 Sphinx Processor Statistics:")
pprint.pprint(stats, width=80)
# Export processed document metadata
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as tmp_file:
sphinx_processor.export_data(tmp_file.name)
temp_file_path = tmp_file.name
print(f"💾 Exported data to: {temp_file_path}")
# Read and display the exported data
with open(temp_file_path, "r") as f:
exported_data = json.load(f)
print("\n📄 Exported document metadata:")
for record in exported_data["records"]:
print(f" - {record['Title']} by {record['Author']}")
print(
f" 📄 {record['pages']} pages, 📝 {record['citations']} citations ({record['Category']})"
)
# Clean up temporary file
os.unlink(temp_file_path)
Performance and Error Handling Demo¶
Let's demonstrate error handling and performance characteristics across all three implementations:
print("⚡ Performance and Error Handling Demo")
print("=" * 50)
import time
from docstring_examples.google_style import ProcessingError
# Test error handling
print("🚨 Error Handling Tests:")
try:
# Test invalid processor name
invalid_processor = GoogleDataProcessor("")
except ValueError as e:
print(f"✅ Caught expected ValueError: {e}")
try:
# Test operation on inactive processor
test_processor = GoogleDataProcessor("test")
test_processor.deactivate()
test_processor.load_data([{"test": "data"}])
except ProcessingError as e:
print(f"✅ Caught expected ProcessingError: {e}")
try:
# Test transformation without data
empty_processor = GoogleDataProcessor("empty")
empty_processor.transform_data(str.upper)
except ProcessingError as e:
print(f"✅ Caught expected ProcessingError: {e}")
print("\n⚡ Performance Comparison:")
# Create large dataset for performance testing
large_dataset = []
for i in range(1000):
large_dataset.append(
{
"id": f"ITEM-{i:04d}",
"value": i * 1.5,
"category": f"category_{i % 10}",
"active": i % 3 == 0,
}
)
print(f"📊 Testing with {len(large_dataset)} records...")
# Test each implementation
processors = [
("Google", GoogleDataProcessor),
("NumPy", NumpyDataProcessor),
("Sphinx", SphinxDataProcessor),
]
for style_name, ProcessorClass in processors:
start_time = time.time()
# Create and use processor
processor = ProcessorClass(f"perf_test_{style_name.lower()}")
processor.load_data(large_dataset)
# Apply transformation
processor.transform_data(lambda x: x * 1.1 if isinstance(x, (int, float)) else x)
# Apply filter
processor.apply_filter(lambda r: r.get("active", False))
end_time = time.time()
duration = end_time - start_time
stats = processor.get_statistics()
print(
f" {style_name} Style: {duration:.3f}s - "
f"{stats['record_count']} records remaining"
)
print("\n✅ All performance tests completed successfully!")
Conclusion¶
This notebook has demonstrated the practical usage of all three docstring style implementations:
Key Takeaways:¶
- Identical Functionality: All three implementations provide the same core features
- Documentation Styles: Each style has different strengths for different contexts
- Error Handling: Consistent error handling across all implementations
- Performance: Similar performance characteristics regardless of docstring style
Style Recommendations:¶
- Google Style: Best for general-purpose projects and team collaboration
- NumPy Style: Ideal for scientific computing and data analysis projects
- Sphinx Style: Perfect for complex projects requiring rich documentation
Next Steps:¶
- Explore the API Documentation to see the generated docs
- Compare the styles in detail with the Style Comparison
- Choose the appropriate style for your project's needs
- Implement consistent documentation across your codebase