Ingesting IP Netblocks with DynamoDB

Introduction

Overview

DynamoDB is Amazon's fully managed NoSQL database service that's designed for high-performance applications at scale. DynamoDB is known for its single-digit millisecond response times, automatic scaling, and built-in security features.

Key characteristics include:

  • Data Model: Uses a key-value and document data model with tables, items, and attributes. Each item must have a primary key (either a partition key alone, or a partition key plus sort key).
  • Performance: Offers consistent, fast performance with automatic scaling up or down based on traffic patterns.
  • Availability: Provides multi-region, multi-master replication and built-in fault tolerance.
  • Consistency Models: Supports both eventually consistent reads (default) and strongly consistent reads.
  • Common Use Cases: Session management, gaming leaderboards, IoT data storage, mobile app backends, and real-time analytics.
  • Integration: Works seamlessly with other AWS services like Lambda, API Gateway, and DynamoDB Streams for real-time data processing.

IP netblock data from WHOISXMLAPI would benefit from DynamoDB's fast lookups and scalability, especially if you're planning to do IP range queries or lookups.

IP Netblock Questions and Use Cases

Data Feed Questions You Should Prepare For:

  • What format do you plan to use? JSON or CSV?
  • Determine how large the dataset is per day, week, month?
  • How frequently do you plan to update?
  • What are the key fields in each record you plan to use (IP ranges, organization info, ASN, etc.)

Plan Your Use Case:

  • What types of queries will you be running? (IP lookups, range searches, organization queries?)
  • Do you need real-time ingestion or batch processing?
  • Are you planning to query by specific IP addresses to find which netblock they belong to?

Technical Considerations: The main challenge with IP netblock data in DynamoDB is that it's inherently range-based data, but DynamoDB excels at exact key lookups. Given that information, you’ll strongly need to consider:

  • Primary Key Design - How to structure partition/sort keys for efficient queries
  • IP Range Representation - Converting CIDR blocks to queryable formats
  • Ingestion Strategy - Batch writes vs streaming
  • Query Patterns - Supporting both exact matches and range queries

DynamoDB IP Netblock Implementation Guide

This guide walks you through implementing a scalable IP netblock lookup system using AWS DynamoDB and the WHOISXMLAPI data feed.

Table of Contents

  • Prerequisites
  • AWS Setup
  • Project Setup
  • Database Design
  • Implementation Steps
  • Testing
  • Production Deployment
  • Performance Optimization
  • Monitoring
  • Troubleshooting

Prerequisites

Technical Requirements

  • Node.js 16+ installed
  • AWS CLI configured
  • Git for version control
  • Basic knowledge of JavaScript, AWS, and DynamoDB

AWS Requirements

  • AWS Account with appropriate permissions
  • IAM user with DynamoDB access
  • Estimated monthly cost: $20-100 depending on data size and query volume

Data Requirements

  • Access to WHOISXMLAPI IP Netblock data feed
  • Sample CSV file for testing

AWS Setup

1. Create IAM User

# Create IAM user for the application

aws iam create-user --user-name ip-netblock-service

# Create access key

aws iam create-access-key --user-name ip-netblock-service

2. Create IAM Policy

Create a file dynamodb-policy.json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:PutItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:GetItem",
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:UpdateItem",
                "dynamodb:DeleteItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:*:*:table/ip-netblocks*"
            ]
        }
    ]
}

Attach the policy:

aws iam put-user-policy --user-name ip-netblock-service --policy-name DynamoDBIPNetblockPolicy --policy-document file://dynamodb-policy.json

3. Configure AWS Credentials

aws configure

# Enter your access key, secret key, and region (e.g., us-east-1)

Project Setup

1. Initialize Project

mkdir ip-netblock-system

cd ip-netblock-system

npm init -y

2. Install Dependencies

npm install aws-sdk papaparse dotenv

npm install --save-dev jest nodemon

3. Project Structure

ip-netblock-system/
├── src/
│   ├── config/
│   │   └── aws-config.js
│   ├── models/
│   │   ├── ip-processor.js
│   │   └── dynamodb-operations.js
│   ├── services/
│   │   ├── ingestion-service.js
│   │   └── lookup-service.js
│   └── utils/
│       └── logger.js
├── scripts/
│   ├── create-tables.js
│   ├── ingest-data.js
│   └── daily-batch.js
├── tests/
│   └── lookup-service.test.js
├── data/
│   └── sample-data.csv
├── .env
├── .gitignore
└── package.json

4. Environment Configuration

Create .env file:

AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
DYNAMODB_TABLE_NAME=ip-netblocks
DYNAMODB_LOOKUP_TABLE_NAME=ip-netblocks-lookup
LOG_LEVEL=info

Database Design

Main Table Schema

Table: ip-netblocks
Primary Key: 
  - Partition Key: ip_range_start (Number)
  - Sort Key: ip_range_end (Number)

Global Secondary Indexes:
  - ASN-Index: as_number (Number)
  - Country-Index: country (String)

Attributes: All netblock data from WHOISXMLAPI

Lookup Table Schema (for fast queries)

Table: ip-netblocks-lookup
Primary Key:
  - Partition Key: ip_shard (String) - First 2 octets
  - Sort Key: ip_range_key (String) - Sortable range identifier

Purpose: Enable fast IP lookups by sharding

Implementation Steps

Step 1: Create Configuration Files

src/config/aws-config.js

const AWS = require('aws-sdk');
require('dotenv').config();

AWS.config.update({
    region: process.env.AWS_REGION,
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
});

const dynamodb = new AWS.DynamoDB.DocumentClient();
const dynamodbClient = new AWS.DynamoDB();

module.exports = {
    dynamodb,
    dynamodbClient,
    TABLE_NAME: process.env.DYNAMODB_TABLE_NAME,
    LOOKUP_TABLE_NAME: process.env.DYNAMODB_LOOKUP_TABLE_NAME
};

Step 2: Create Tables

scripts/create-tables.js

const { dynamodbClient, TABLE_NAME, LOOKUP_TABLE_NAME } = require('../src/config/aws-config');

async function createTables() {
    // Create main table
    const mainTableParams = {
        TableName: TABLE_NAME,
        KeySchema: [
            { AttributeName: 'ip_range_start', KeyType: 'HASH' },
            { AttributeName: 'ip_range_end', KeyType: 'RANGE' }
        ],
        AttributeDefinitions: [
            { AttributeName: 'ip_range_start', AttributeType: 'N' },
            { AttributeName: 'ip_range_end', AttributeType: 'N' },
            { AttributeName: 'as_number', AttributeType: 'N' },
            { AttributeName: 'country', AttributeType: 'S' }
        ],
        BillingMode: 'PAY_PER_REQUEST',
        GlobalSecondaryIndexes: [
            {
                IndexName: 'ASN-Index',
                KeySchema: [{ AttributeName: 'as_number', KeyType: 'HASH' }],
                Projection: { ProjectionType: 'ALL' }
            },
            {
                IndexName: 'Country-Index',
                KeySchema: [{ AttributeName: 'country', KeyType: 'HASH' }],
                Projection: { ProjectionType: 'ALL' }
            }
        ]
    };

    // Create lookup table
    const lookupTableParams = {
        TableName: LOOKUP_TABLE_NAME,
        KeySchema: [
            { AttributeName: 'ip_shard', KeyType: 'HASH' },
            { AttributeName: 'ip_range_key', KeyType: 'RANGE' }
        ],
        AttributeDefinitions: [
            { AttributeName: 'ip_shard', AttributeType: 'S' },
            { AttributeName: 'ip_range_key', AttributeType: 'S' }
        ],
        BillingMode: 'PAY_PER_REQUEST'
    };

    try {
        await dynamodbClient.createTable(mainTableParams).promise();
        console.log(`Created table: ${TABLE_NAME}`);
        
        await dynamodbClient.createTable(lookupTableParams).promise();
        console.log(`Created table: ${LOOKUP_TABLE_NAME}`);
        
        // Wait for tables to be active
        await dynamodbClient.waitFor('tableExists', { TableName: TABLE_NAME }).promise();
        await dynamodbClient.waitFor('tableExists', { TableName: LOOKUP_TABLE_NAME }).promise();
        
        console.log('All tables are active');
    } catch (error) {
        console.error('Error creating tables:', error);
    }
}

if (require.main === module) {
    createTables();
}

module.exports = { createTables };

Step 3: Implement Data Processing

src/models/ip-processor.js

class IPProcessor {
    static ipToInt(ip) {
        return ip.split('.').reduce((acc, octet) => (acc << 8) + parseInt(octet), 0) >>> 0;
    }
    
    static intToIp(int) {
        return [
            (int >>> 24) & 255,
            (int >>> 16) & 255,
            (int >>> 8) & 255,
            int & 255
        ].join('.');
    }
    
    static parseIpRange(rangeStr) {
        const [startIp, endIp] = rangeStr.split(' - ');
        return {
            startIp: startIp.trim(),
            endIp: endIp.trim(),
            startInt: this.ipToInt(startIp.trim()),
            endInt: this.ipToInt(endIp.trim())
        };
    }
    
    static transformRecord(record) {
        const ipRange = this.parseIpRange(record.inetnum);
        
        return {
            ip_range_start: ipRange.startInt,
            ip_range_end: ipRange.endInt,
            inetnum: record.inetnum,
            start_ip: ipRange.startIp,
            end_ip: ipRange.endIp,
            as_number: record.as_number,
            as_name: record.as_name,
            as_route: record.as_route,
            as_domain: record.as_domain,
            as_type: record.as_type,
            netname: record.netname,
            country: record.country,
            city: record.city,
            org_id: record.org_id,
            source: record.source,
            modified: record.modified,
            ingested_at: new Date().toISOString()
        };
    }
}

module.exports = IPProcessor;

Step 4: Create Ingestion Service

src/services/ingestion-service.js

const fs = require('fs');
const Papa = require('papaparse');
const { dynamodb, TABLE_NAME } = require('../config/aws-config');
const IPProcessor = require('../models/ip-processor');

class IngestionService {
    static async ingestFromCSV(filePath) {
        console.log(`Starting ingestion from ${filePath}`);
        
        // Read and parse CSV
        const csvContent = fs.readFileSync(filePath, 'utf8');
        const columnNames = [
            'inetnum', 'inetnumFirst', 'inetnumLast', 'as_number', 'as_name',
            'as_route', 'as_domain', 'netname', 'modified', 'country', 'city',
            'org_id', 'abuse_contacts', 'admin_contacts', 'tech_contacts',
            'maintainers', 'domain_maintainers', 'lower_maintainers',
            'routes_maintainers', 'source', 'remarks', 'as_type', 'parent'
        ];
        
        const parsedData = Papa.parse(csvContent, {
            header: false,
            dynamicTyping: true,
            skipEmptyLines: true
        });
        
        // Transform data
        const transformedData = parsedData.data.map(row => {
            const record = {};
            columnNames.forEach((colName, index) => {
                record[colName] = row[index] || null;
            });
            return IPProcessor.transformRecord(record);
        });
        
        console.log(`Transformed ${transformedData.length} records`);
        
        // Batch write to DynamoDB
        await this.batchWriteItems(transformedData);
        
        console.log('Ingestion completed successfully');
    }
    
    static async batchWriteItems(items) {
        const batchSize = 25;
        const batches = [];
        
        for (let i = 0; i < items.length; i += batchSize) {
            batches.push(items.slice(i, i + batchSize));
        }
        
        console.log(`Processing ${batches.length} batches...`);
        
        for (let i = 0; i < batches.length; i++) {
            const batch = batches[i];
            const putRequests = batch.map(item => ({
                PutRequest: { Item: item }
            }));
            
            const params = {
                RequestItems: {
                    [TABLE_NAME]: putRequests
                }
            };
            
            await dynamodb.batchWrite(params).promise();
            console.log(`Batch ${i + 1}/${batches.length} completed`);
        }
    }
}

module.exports = IngestionService;

Step 5: Create Lookup Service

src/services/lookup-service.js

const { dynamodb, TABLE_NAME } = require('../config/aws-config');
const IPProcessor = require('../models/ip-processor');

class LookupService {
    static async findByIpAddress(ipAddress) {
        const ipInt = IPProcessor.ipToInt(ipAddress);
        
        const params = {
            TableName: TABLE_NAME,
            FilterExpression: '#start <= :ip AND #end >= :ip',
            ExpressionAttributeNames: {
                '#start': 'ip_range_start',
                '#end': 'ip_range_end'
            },
            ExpressionAttributeValues: {
                ':ip': ipInt
            }
        };
        
        const result = await dynamodb.scan(params).promise();
        return result.Items;
    }
    
    static async queryByASN(asn) {
        const params = {
            TableName: TABLE_NAME,
            IndexName: 'ASN-Index',
            KeyConditionExpression: 'as_number = :asn',
            ExpressionAttributeValues: {
                ':asn': asn
            }
        };
        
        const result = await dynamodb.query(params).promise();
        return result.Items;
    }
    
    static async queryByCountry(country) {
        const params = {
            TableName: TABLE_NAME,
            IndexName: 'Country-Index',
            KeyConditionExpression: 'country = :country',
            ExpressionAttributeValues: {
                ':country': country
            }
        };
        
        const result = await dynamodb.query(params).promise();
        return result.Items;
    }
}

module.exports = LookupService;

Testing

Step 1: Create Test File

tests/lookup-service.test.js

const LookupService = require('../src/services/lookup-service');

describe('LookupService', () => {
    test('should find netblock for IP address', async () => {
        const results = await LookupService.findByIpAddress('1.0.194.100');
        expect(results).toBeDefined();
        expect(Array.isArray(results)).toBe(true);
    });
    
    test('should query by ASN', async () => {
        const results = await LookupService.queryByASN(23969);
        expect(results).toBeDefined();
        expect(Array.isArray(results)).toBe(true);
    });
});

Step 2: Run Tests

npm test

Production Deployment

1. Create Deployment Script

scripts/deploy.js

const { createTables } = require('./create-tables');
const IngestionService = require('../src/services/ingestion-service');

async function deploy() {
    console.log('Starting production deployment...');
    
    // Create tables
    await createTables();
    
    // Initial data load
    if (process.env.INITIAL_DATA_FILE) {
        await IngestionService.ingestFromCSV(process.env.INITIAL_DATA_FILE);
    }
    
    console.log('Deployment completed successfully');
}

if (require.main === module) {
    deploy().catch(console.error);
}

2. Set Up Daily Batch Processing

scripts/daily-batch.js

const IngestionService = require('../src/services/ingestion-service');

async function dailyBatch() {
    const dataFile = process.env.DAILY_DATA_FILE || './data/daily-update.csv';
    
    console.log('Starting daily batch process...');
    const startTime = Date.now();
    
    await IngestionService.ingestFromCSV(dataFile);
    
    const endTime = Date.now();
    console.log(`Daily batch completed in ${(endTime - startTime) / 1000} seconds`);
}

if (require.main === module) {
    dailyBatch().catch(console.error);
}

3. Create Cron Job

# Edit crontab
crontab -e

# Add daily batch job at 2 AM
0 2 * * * cd /path/to/ip-netblock-system && node scripts/daily-batch.js

Performance Optimization

1. Monitor Read/Write Capacity

  • Use CloudWatch to monitor table metrics
  • Set up auto-scaling if using provisioned capacity
  • Consider switching to on-demand billing for variable workloads

2. Optimize Queries

  • Use the lookup table for frequent IP queries
  • Implement caching for common lookups
  • Consider using ElastiCache for hot data

3. Batch Operations

  • Always use batch operations for bulk inserts
  • Implement exponential backoff for throttling
  • Process large datasets in chunks

Monitoring

1. CloudWatch Metrics

Monitor these key metrics:

  • Table read/write capacity utilization
  • Throttled requests
  • Item count
  • Table size

2. Application Metrics

// Add to your lookup service
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();

class MetricsService {
    static async recordLookup(ipAddress, resultCount, duration) {
        const params = {
            Namespace: 'IPNetblock/Lookups',
            MetricData: [
                {
                    MetricName: 'LookupDuration',
                    Value: duration,
                    Unit: 'Milliseconds'
                },
                {
                    MetricName: 'ResultCount',
                    Value: resultCount,
                    Unit: 'Count'
                }
            ]
        };
        
        await cloudwatch.putMetricData(params).promise();
    }
}

Troubleshooting

Common Issues

1. Throttling Errors

// Implement exponential backoff
const retryWithBackoff = async (operation, maxRetries = 3) => {
    for (let i = 0; i < maxRetries; i++) {
        try {
            return await operation();
        } catch (error) {
            if (error.code === 'ProvisionedThroughputExceededException' && i < maxRetries - 1) {
                const delay = Math.pow(2, i) * 1000;
                await new Promise(resolve => setTimeout(resolve, delay));
                continue;
            }
            throw error;
        }
    }
};

2. Large IP Range Queries

  • Implement pagination for large result sets
  • Use parallel queries across multiple shards
  • Consider using DynamoDB Streams for real-time updates

3. Memory Issues

  • Process large CSV files in chunks
  • Use streaming parsers for very large datasets
  • Implement garbage collection hints

Debugging Commands

# Check table status

aws dynamodb describe-table --table-name ip-netblocks

# Monitor table metrics

aws logs filter-log-events --log-group-name /aws/dynamodb/table/ip-netblocks

# Test individual operations

node -e "
const LookupService = require('./src/services/lookup-service');
LookupService.findByIpAddress('8.8.8.8').then(console.log);
"

Cost Optimization

Estimated Costs (Monthly)

  • Small dataset (100K records): $10-20
  • Medium dataset (1M records): $30-60
  • Large dataset (10M+ records): $100-300

Cost Reduction Tips

  • Use on-demand billing for variable workloads
  • Implement data archiving for old records
  • Use DynamoDB TTL for automatic cleanup
  • Optimize query patterns to reduce scan operations
  • Consider data compression for large text fields

Next Step Recommendations

  • API Layer: Add REST API using Express.js or AWS Lambda
  • Authentication: Implement API keys or JWT tokens
  • Rate Limiting: Add request throttling
  • Caching: Implement Redis/ElastiCache layer
  • Monitoring: Set up comprehensive logging and alerting
  • Documentation: Create API documentation with Swagger
Try our WhoisXML API for free
Get started