Ingesting IP Netblocks with DynamoDB
Introduction
Overview
DynamoDB is Amazon's fully managed NoSQL database service that's designed for high-performance applications at scale. DynamoDB is known for its single-digit millisecond response times, automatic scaling, and built-in security features.
Key characteristics include:
- Data Model: Uses a key-value and document data model with tables, items, and attributes. Each item must have a primary key (either a partition key alone, or a partition key plus sort key).
- Performance: Offers consistent, fast performance with automatic scaling up or down based on traffic patterns.
- Availability: Provides multi-region, multi-master replication and built-in fault tolerance.
- Consistency Models: Supports both eventually consistent reads (default) and strongly consistent reads.
- Common Use Cases: Session management, gaming leaderboards, IoT data storage, mobile app backends, and real-time analytics.
- Integration: Works seamlessly with other AWS services like Lambda, API Gateway, and DynamoDB Streams for real-time data processing.
IP netblock data from WHOISXMLAPI would benefit from DynamoDB's fast lookups and scalability, especially if you're planning to do IP range queries or lookups.
IP Netblock Questions and Use Cases
Data Feed Questions You Should Prepare For:
- What format do you plan to use? JSON or CSV?
- Determine how large the dataset is per day, week, month?
- How frequently do you plan to update?
- What are the key fields in each record you plan to use (IP ranges, organization info, ASN, etc.)
Plan Your Use Case:
- What types of queries will you be running? (IP lookups, range searches, organization queries?)
- Do you need real-time ingestion or batch processing?
- Are you planning to query by specific IP addresses to find which netblock they belong to?
Technical Considerations: The main challenge with IP netblock data in DynamoDB is that it's inherently range-based data, but DynamoDB excels at exact key lookups. Given that information, you’ll strongly need to consider:
- Primary Key Design - How to structure partition/sort keys for efficient queries
- IP Range Representation - Converting CIDR blocks to queryable formats
- Ingestion Strategy - Batch writes vs streaming
- Query Patterns - Supporting both exact matches and range queries
DynamoDB IP Netblock Implementation Guide
This guide walks you through implementing a scalable IP netblock lookup system using AWS DynamoDB and the WHOISXMLAPI data feed.
Table of Contents
- Prerequisites
- AWS Setup
- Project Setup
- Database Design
- Implementation Steps
- Testing
- Production Deployment
- Performance Optimization
- Monitoring
- Troubleshooting
Prerequisites
Technical Requirements
- Node.js 16+ installed
- AWS CLI configured
- Git for version control
- Basic knowledge of JavaScript, AWS, and DynamoDB
AWS Requirements
- AWS Account with appropriate permissions
- IAM user with DynamoDB access
- Estimated monthly cost: $20-100 depending on data size and query volume
Data Requirements
- Access to WHOISXMLAPI IP Netblock data feed
- Sample CSV file for testing
AWS Setup
1. Create IAM User
# Create IAM user for the application
aws iam create-user --user-name ip-netblock-service
# Create access key
aws iam create-access-key --user-name ip-netblock-service
2. Create IAM Policy
Create a file dynamodb-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:CreateTable",
"dynamodb:DescribeTable",
"dynamodb:PutItem",
"dynamodb:BatchWriteItem",
"dynamodb:GetItem",
"dynamodb:Query",
"dynamodb:Scan",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": [
"arn:aws:dynamodb:*:*:table/ip-netblocks*"
]
}
]
}
Attach the policy:
aws iam put-user-policy --user-name ip-netblock-service --policy-name DynamoDBIPNetblockPolicy --policy-document file://dynamodb-policy.json
3. Configure AWS Credentials
aws configure
# Enter your access key, secret key, and region (e.g., us-east-1)
Project Setup
1. Initialize Project
mkdir ip-netblock-system
cd ip-netblock-system
npm init -y
2. Install Dependencies
npm install aws-sdk papaparse dotenv
npm install --save-dev jest nodemon
3. Project Structure
ip-netblock-system/
├── src/
│ ├── config/
│ │ └── aws-config.js
│ ├── models/
│ │ ├── ip-processor.js
│ │ └── dynamodb-operations.js
│ ├── services/
│ │ ├── ingestion-service.js
│ │ └── lookup-service.js
│ └── utils/
│ └── logger.js
├── scripts/
│ ├── create-tables.js
│ ├── ingest-data.js
│ └── daily-batch.js
├── tests/
│ └── lookup-service.test.js
├── data/
│ └── sample-data.csv
├── .env
├── .gitignore
└── package.json
4. Environment Configuration
Create .env file:
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
DYNAMODB_TABLE_NAME=ip-netblocks
DYNAMODB_LOOKUP_TABLE_NAME=ip-netblocks-lookup
LOG_LEVEL=info
Database Design
Main Table Schema
Table: ip-netblocks
Primary Key:
- Partition Key: ip_range_start (Number)
- Sort Key: ip_range_end (Number)
Global Secondary Indexes:
- ASN-Index: as_number (Number)
- Country-Index: country (String)
Attributes: All netblock data from WHOISXMLAPI
Lookup Table Schema (for fast queries)
Table: ip-netblocks-lookup
Primary Key:
- Partition Key: ip_shard (String) - First 2 octets
- Sort Key: ip_range_key (String) - Sortable range identifier
Purpose: Enable fast IP lookups by sharding
Implementation Steps
Step 1: Create Configuration Files
src/config/aws-config.js
const AWS = require('aws-sdk');
require('dotenv').config();
AWS.config.update({
region: process.env.AWS_REGION,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
});
const dynamodb = new AWS.DynamoDB.DocumentClient();
const dynamodbClient = new AWS.DynamoDB();
module.exports = {
dynamodb,
dynamodbClient,
TABLE_NAME: process.env.DYNAMODB_TABLE_NAME,
LOOKUP_TABLE_NAME: process.env.DYNAMODB_LOOKUP_TABLE_NAME
};
Step 2: Create Tables
scripts/create-tables.js
const { dynamodbClient, TABLE_NAME, LOOKUP_TABLE_NAME } = require('../src/config/aws-config');
async function createTables() {
// Create main table
const mainTableParams = {
TableName: TABLE_NAME,
KeySchema: [
{ AttributeName: 'ip_range_start', KeyType: 'HASH' },
{ AttributeName: 'ip_range_end', KeyType: 'RANGE' }
],
AttributeDefinitions: [
{ AttributeName: 'ip_range_start', AttributeType: 'N' },
{ AttributeName: 'ip_range_end', AttributeType: 'N' },
{ AttributeName: 'as_number', AttributeType: 'N' },
{ AttributeName: 'country', AttributeType: 'S' }
],
BillingMode: 'PAY_PER_REQUEST',
GlobalSecondaryIndexes: [
{
IndexName: 'ASN-Index',
KeySchema: [{ AttributeName: 'as_number', KeyType: 'HASH' }],
Projection: { ProjectionType: 'ALL' }
},
{
IndexName: 'Country-Index',
KeySchema: [{ AttributeName: 'country', KeyType: 'HASH' }],
Projection: { ProjectionType: 'ALL' }
}
]
};
// Create lookup table
const lookupTableParams = {
TableName: LOOKUP_TABLE_NAME,
KeySchema: [
{ AttributeName: 'ip_shard', KeyType: 'HASH' },
{ AttributeName: 'ip_range_key', KeyType: 'RANGE' }
],
AttributeDefinitions: [
{ AttributeName: 'ip_shard', AttributeType: 'S' },
{ AttributeName: 'ip_range_key', AttributeType: 'S' }
],
BillingMode: 'PAY_PER_REQUEST'
};
try {
await dynamodbClient.createTable(mainTableParams).promise();
console.log(`Created table: ${TABLE_NAME}`);
await dynamodbClient.createTable(lookupTableParams).promise();
console.log(`Created table: ${LOOKUP_TABLE_NAME}`);
// Wait for tables to be active
await dynamodbClient.waitFor('tableExists', { TableName: TABLE_NAME }).promise();
await dynamodbClient.waitFor('tableExists', { TableName: LOOKUP_TABLE_NAME }).promise();
console.log('All tables are active');
} catch (error) {
console.error('Error creating tables:', error);
}
}
if (require.main === module) {
createTables();
}
module.exports = { createTables };
Step 3: Implement Data Processing
src/models/ip-processor.js
class IPProcessor {
static ipToInt(ip) {
return ip.split('.').reduce((acc, octet) => (acc << 8) + parseInt(octet), 0) >>> 0;
}
static intToIp(int) {
return [
(int >>> 24) & 255,
(int >>> 16) & 255,
(int >>> 8) & 255,
int & 255
].join('.');
}
static parseIpRange(rangeStr) {
const [startIp, endIp] = rangeStr.split(' - ');
return {
startIp: startIp.trim(),
endIp: endIp.trim(),
startInt: this.ipToInt(startIp.trim()),
endInt: this.ipToInt(endIp.trim())
};
}
static transformRecord(record) {
const ipRange = this.parseIpRange(record.inetnum);
return {
ip_range_start: ipRange.startInt,
ip_range_end: ipRange.endInt,
inetnum: record.inetnum,
start_ip: ipRange.startIp,
end_ip: ipRange.endIp,
as_number: record.as_number,
as_name: record.as_name,
as_route: record.as_route,
as_domain: record.as_domain,
as_type: record.as_type,
netname: record.netname,
country: record.country,
city: record.city,
org_id: record.org_id,
source: record.source,
modified: record.modified,
ingested_at: new Date().toISOString()
};
}
}
module.exports = IPProcessor;
Step 4: Create Ingestion Service
src/services/ingestion-service.js
const fs = require('fs');
const Papa = require('papaparse');
const { dynamodb, TABLE_NAME } = require('../config/aws-config');
const IPProcessor = require('../models/ip-processor');
class IngestionService {
static async ingestFromCSV(filePath) {
console.log(`Starting ingestion from ${filePath}`);
// Read and parse CSV
const csvContent = fs.readFileSync(filePath, 'utf8');
const columnNames = [
'inetnum', 'inetnumFirst', 'inetnumLast', 'as_number', 'as_name',
'as_route', 'as_domain', 'netname', 'modified', 'country', 'city',
'org_id', 'abuse_contacts', 'admin_contacts', 'tech_contacts',
'maintainers', 'domain_maintainers', 'lower_maintainers',
'routes_maintainers', 'source', 'remarks', 'as_type', 'parent'
];
const parsedData = Papa.parse(csvContent, {
header: false,
dynamicTyping: true,
skipEmptyLines: true
});
// Transform data
const transformedData = parsedData.data.map(row => {
const record = {};
columnNames.forEach((colName, index) => {
record[colName] = row[index] || null;
});
return IPProcessor.transformRecord(record);
});
console.log(`Transformed ${transformedData.length} records`);
// Batch write to DynamoDB
await this.batchWriteItems(transformedData);
console.log('Ingestion completed successfully');
}
static async batchWriteItems(items) {
const batchSize = 25;
const batches = [];
for (let i = 0; i < items.length; i += batchSize) {
batches.push(items.slice(i, i + batchSize));
}
console.log(`Processing ${batches.length} batches...`);
for (let i = 0; i < batches.length; i++) {
const batch = batches[i];
const putRequests = batch.map(item => ({
PutRequest: { Item: item }
}));
const params = {
RequestItems: {
[TABLE_NAME]: putRequests
}
};
await dynamodb.batchWrite(params).promise();
console.log(`Batch ${i + 1}/${batches.length} completed`);
}
}
}
module.exports = IngestionService;
Step 5: Create Lookup Service
src/services/lookup-service.js
const { dynamodb, TABLE_NAME } = require('../config/aws-config');
const IPProcessor = require('../models/ip-processor');
class LookupService {
static async findByIpAddress(ipAddress) {
const ipInt = IPProcessor.ipToInt(ipAddress);
const params = {
TableName: TABLE_NAME,
FilterExpression: '#start <= :ip AND #end >= :ip',
ExpressionAttributeNames: {
'#start': 'ip_range_start',
'#end': 'ip_range_end'
},
ExpressionAttributeValues: {
':ip': ipInt
}
};
const result = await dynamodb.scan(params).promise();
return result.Items;
}
static async queryByASN(asn) {
const params = {
TableName: TABLE_NAME,
IndexName: 'ASN-Index',
KeyConditionExpression: 'as_number = :asn',
ExpressionAttributeValues: {
':asn': asn
}
};
const result = await dynamodb.query(params).promise();
return result.Items;
}
static async queryByCountry(country) {
const params = {
TableName: TABLE_NAME,
IndexName: 'Country-Index',
KeyConditionExpression: 'country = :country',
ExpressionAttributeValues: {
':country': country
}
};
const result = await dynamodb.query(params).promise();
return result.Items;
}
}
module.exports = LookupService;
Testing
Step 1: Create Test File
tests/lookup-service.test.js
const LookupService = require('../src/services/lookup-service');
describe('LookupService', () => {
test('should find netblock for IP address', async () => {
const results = await LookupService.findByIpAddress('1.0.194.100');
expect(results).toBeDefined();
expect(Array.isArray(results)).toBe(true);
});
test('should query by ASN', async () => {
const results = await LookupService.queryByASN(23969);
expect(results).toBeDefined();
expect(Array.isArray(results)).toBe(true);
});
});
Step 2: Run Tests
npm test
Production Deployment
1. Create Deployment Script
scripts/deploy.js
const { createTables } = require('./create-tables');
const IngestionService = require('../src/services/ingestion-service');
async function deploy() {
console.log('Starting production deployment...');
// Create tables
await createTables();
// Initial data load
if (process.env.INITIAL_DATA_FILE) {
await IngestionService.ingestFromCSV(process.env.INITIAL_DATA_FILE);
}
console.log('Deployment completed successfully');
}
if (require.main === module) {
deploy().catch(console.error);
}
2. Set Up Daily Batch Processing
scripts/daily-batch.js
const IngestionService = require('../src/services/ingestion-service');
async function dailyBatch() {
const dataFile = process.env.DAILY_DATA_FILE || './data/daily-update.csv';
console.log('Starting daily batch process...');
const startTime = Date.now();
await IngestionService.ingestFromCSV(dataFile);
const endTime = Date.now();
console.log(`Daily batch completed in ${(endTime - startTime) / 1000} seconds`);
}
if (require.main === module) {
dailyBatch().catch(console.error);
}
3. Create Cron Job
# Edit crontab
crontab -e
# Add daily batch job at 2 AM
0 2 * * * cd /path/to/ip-netblock-system && node scripts/daily-batch.js
Performance Optimization
1. Monitor Read/Write Capacity
- Use CloudWatch to monitor table metrics
- Set up auto-scaling if using provisioned capacity
- Consider switching to on-demand billing for variable workloads
2. Optimize Queries
- Use the lookup table for frequent IP queries
- Implement caching for common lookups
- Consider using ElastiCache for hot data
3. Batch Operations
- Always use batch operations for bulk inserts
- Implement exponential backoff for throttling
- Process large datasets in chunks
Monitoring
1. CloudWatch Metrics
Monitor these key metrics:
- Table read/write capacity utilization
- Throttled requests
- Item count
- Table size
2. Application Metrics
// Add to your lookup service
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();
class MetricsService {
static async recordLookup(ipAddress, resultCount, duration) {
const params = {
Namespace: 'IPNetblock/Lookups',
MetricData: [
{
MetricName: 'LookupDuration',
Value: duration,
Unit: 'Milliseconds'
},
{
MetricName: 'ResultCount',
Value: resultCount,
Unit: 'Count'
}
]
};
await cloudwatch.putMetricData(params).promise();
}
}
Troubleshooting
Common Issues
1. Throttling Errors
// Implement exponential backoff
const retryWithBackoff = async (operation, maxRetries = 3) => {
for (let i = 0; i < maxRetries; i++) {
try {
return await operation();
} catch (error) {
if (error.code === 'ProvisionedThroughputExceededException' && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
};
2. Large IP Range Queries
- Implement pagination for large result sets
- Use parallel queries across multiple shards
- Consider using DynamoDB Streams for real-time updates
3. Memory Issues
- Process large CSV files in chunks
- Use streaming parsers for very large datasets
- Implement garbage collection hints
Debugging Commands
# Check table status
aws dynamodb describe-table --table-name ip-netblocks
# Monitor table metrics
aws logs filter-log-events --log-group-name /aws/dynamodb/table/ip-netblocks
# Test individual operations
node -e "
const LookupService = require('./src/services/lookup-service');
LookupService.findByIpAddress('8.8.8.8').then(console.log);
"
Cost Optimization
Estimated Costs (Monthly)
- Small dataset (100K records): $10-20
- Medium dataset (1M records): $30-60
- Large dataset (10M+ records): $100-300
Cost Reduction Tips
- Use on-demand billing for variable workloads
- Implement data archiving for old records
- Use DynamoDB TTL for automatic cleanup
- Optimize query patterns to reduce scan operations
- Consider data compression for large text fields
Next Step Recommendations
- API Layer: Add REST API using Express.js or AWS Lambda
- Authentication: Implement API keys or JWT tokens
- Rate Limiting: Add request throttling
- Caching: Implement Redis/ElastiCache layer
- Monitoring: Set up comprehensive logging and alerting
- Documentation: Create API documentation with Swagger