PostgreSQL Bulk Loading
When working with large amounts of data, importing it efficiently into your PostgreSQL database becomes crucial. This guide covers various methods for bulk loading data into PostgreSQL, with a focus on performance and practical applications.
Introduction
Bulk loading refers to the process of inserting large volumes of data into a database at once, rather than row by row. PostgreSQL offers several efficient methods for bulk loading that significantly outperform standard INSERT statements when dealing with large datasets.
Why is bulk loading important?
- Performance: Load millions of rows in seconds instead of hours
 - Reduced overhead: Minimizes transaction overhead and logging
 - Resource efficiency: Uses less memory and CPU compared to individual inserts
 
The COPY Command
The COPY command is PostgreSQL's primary tool for bulk loading data. It's designed specifically for high-performance data imports and exports.
Basic Syntax
COPY table_name [ (column_name [, ...]) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
Loading Data from a CSV File
Let's say we have a CSV file named employees.csv with the following content:
id,first_name,last_name,email,hire_date
1,John,Smith,[email protected],2022-01-15
2,Mary,Johnson,[email protected],2022-02-20
3,Robert,Williams,[email protected],2022-03-05
And we want to load it into an employees table:
-- First create the table
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    hire_date DATE
);
-- Then use COPY to load the data
COPY employees(id, first_name, last_name, email, hire_date)
FROM '/path/to/employees.csv'
WITH (FORMAT CSV, HEADER);
Options for COPY
The COPY command supports numerous options for flexibility:
COPY table_name FROM '/path/to/file.csv' WITH (
    FORMAT CSV,           -- Format (CSV, TEXT, BINARY)
    HEADER,               -- Skip the first line (header)
    DELIMITER ',',        -- Column separator (default is tab for TEXT)
    QUOTE '"',            -- Quote character
    ESCAPE '\',           -- Escape character
    NULL 'NULL',          -- String representing NULL values
    ENCODING 'UTF8'       -- File encoding
);
Using \copy with psql
While COPY is powerful, it requires server file system access. For client-side file loading, PostgreSQL's command-line utility psql provides the \copy meta-command:
\copy employees FROM '/local/path/to/employees.csv' WITH (FORMAT CSV, HEADER)
The \copy command runs on the client and sends the data to the server, so it works with files on your local machine.
Key differences between COPY and \copy:
COPYruns server-side and is faster, but requires server file access\copyruns client-side and works with local files- Syntax is almost identical, but 
\copydoesn't need quotes around the filename 
Practical Example: Importing a Large Dataset
Let's walk through a complete example of importing a large dataset of customer transactions:
-- Create the table structure
CREATE TABLE transactions (
    transaction_id UUID PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    transaction_date TIMESTAMP NOT NULL,
    product_id VARCHAR(20),
    quantity INTEGER,
    status VARCHAR(10)
);
-- Import from CSV using COPY
COPY transactions
FROM '/data/transactions.csv'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER ',',
    NULL 'NA'
);
-- Verify the imported data
SELECT COUNT(*) FROM transactions;
Output:
 count
-------
 100000
(1 row)
Using COPY with Transformations
Sometimes you need to transform data during import. While the COPY command doesn't directly support transformations, you can combine it with temporary tables and SQL:
-- Create a temporary table with the exact structure of your CSV
CREATE TEMPORARY TABLE temp_users (
    user_id VARCHAR(20),
    username VARCHAR(50),
    signup_date VARCHAR(20),  -- Keep as string initially
    status VARCHAR(10)
);
-- Load raw data into the temporary table
COPY temp_users FROM '/path/to/users.csv' WITH (FORMAT CSV, HEADER);
-- Insert into the final table with transformations
INSERT INTO users (user_id, username, signup_date, status, is_active)
SELECT 
    user_id,
    LOWER(username),  -- Convert username to lowercase
    TO_DATE(signup_date, 'YYYY-MM-DD'),  -- Convert string to date
    status,
    CASE WHEN status = 'active' THEN TRUE ELSE FALSE END  -- Derive boolean field
FROM temp_users;
Bulk Loading from Programming Languages
Many programming languages offer efficient ways to perform bulk loading into PostgreSQL.
Python Example with psycopg2
import psycopg2
import csv
# Connect to the database
conn = psycopg2.connect("dbname=mydb user=postgres password=secret")
cur = conn.cursor()
# Open the CSV file
with open('products.csv', 'r') as f:
    # Skip the header row
    next(f)
    
    # Use copy_from to bulk load
    cur.copy_from(
        f,
        'products',
        sep=',',
        columns=('product_id', 'name', 'category', 'price', 'stock')
    )
# Commit the transaction
conn.commit()
# Close the connection
cur.close()
conn.close()
Node.js Example with pg-copy-streams
const { Pool } = require('pg');
const fs = require('fs');
const { pipeline } = require('stream');
const copyFrom = require('pg-copy-streams').from;
const pool = new Pool({
  user: 'postgres',
  host: 'localhost',
  database: 'mydb',
  password: 'secret',
  port: 5432,
});
async function bulkLoadData() {
  const client = await pool.connect();
  
  try {
    // Create a readable stream from your CSV file
    const fileStream = fs.createReadStream('orders.csv');
    
    // Create a PostgreSQL COPY stream
    const copyStream = client.query(copyFrom(`
      COPY orders(order_id, customer_id, order_date, total_amount)
      FROM STDIN WITH (FORMAT CSV, HEADER)
    `));
    
    // Pipe the file stream to the copy stream
    await new Promise((resolve, reject) => {
      pipeline(fileStream, copyStream, (err) => {
        if (err) reject(err);
        else resolve();
      });
    });
    
    console.log('Bulk loading completed successfully');
  } finally {
    client.release();
  }
}
bulkLoadData().catch(err => console.error('Error during bulk loading:', err));
Performance Considerations
To achieve the best performance when bulk loading:
- 
Disable constraints and indexes during load:
-- Before loading
ALTER TABLE mytable DROP CONSTRAINT mytable_pkey;
DROP INDEX IF EXISTS mytable_idx;
-- Perform COPY operation
-- After loading
ALTER TABLE mytable ADD CONSTRAINT mytable_pkey PRIMARY KEY (id);
CREATE INDEX mytable_idx ON mytable(column_name); - 
Increase maintenance memory:
SET maintenance_work_mem = '1GB'; -- Adjust based on available RAM - 
Use multiple COPY commands in parallel for different chunks of your data
 - 
Use UNLOGGED tables for temporary data loading:
CREATE UNLOGGED TABLE temp_loading_table (...);
-- Perform COPY, then move data to regular table if needed 
Real-world Use Cases
ETL Data Pipeline
Database Migration
When migrating from another database system to PostgreSQL:
- Export data from source database to CSV format
 - Create corresponding table structure in PostgreSQL
 - Use COPY to efficiently import the data
 - Verify data integrity and create necessary indexes
 
Common Problems and Solutions
Handling Errors During Import
-- Create an error table
CREATE TABLE import_errors (
    line_number BIGINT,
    error_message TEXT,
    raw_line TEXT
);
-- Import with error logging
COPY products FROM '/path/to/products.csv' WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER ',',
    FORCE_NULL (description, manufacturer),
    FORCE_NOT_NULL (product_id, name, price),
    LOG_ERRORS INTO import_errors
);
Dealing with Special Characters
COPY my_table FROM '/path/to/file.csv' WITH (
    FORMAT CSV,
    DELIMITER ',',
    QUOTE '"',
    ESCAPE '"'
);
Summary
PostgreSQL's bulk loading capabilities provide efficient methods for importing large volumes of data:
- The 
COPYcommand offers high-performance server-side data loading - The 
\copymeta-command provides client-side loading functionality - Temporary tables allow for data transformations during the import process
 - Programming languages can interface with PostgreSQL for bulk operations
 
By using these tools and following best practices, you can import millions of rows in seconds rather than hours, making your data loading processes more efficient and reliable.
Exercises
- Create a table to store product information and practice importing data using the 
COPYcommand. - Compare the performance of loading 10,000 rows using individual 
INSERTstatements versus theCOPYcommand. - Write a script to generate a large CSV file and bulk load it into PostgreSQL using both the server-side 
COPYand client-side\copycommands. - Implement a data transformation during bulk loading by using a temporary table and SQL.
 
Additional Resources
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!