Split a large json file into multiple smaller files in python

To split a large JSON file into multiple smaller files in Python, you can follow these general steps:

Read the large JSON file.
Split the data into smaller chunks.
Write each smaller chunk to separate JSON files.

Here's an example implementation:

import json

# Function to split a list into smaller chunks
def chunk_list(lst, chunk_size):
    for i in range(0, len(lst), chunk_size):
        yield lst[i:i + chunk_size]

# Read the large JSON file
large_json_file = 'large_file.json'

with open(large_json_file, 'r') as f:
    data = json.load(f)

# Split data into smaller chunks
chunk_size = 100  # Adjust as needed
data_chunks = chunk_list(data, chunk_size)

# Write each chunk to separate JSON files
for i, chunk in enumerate(data_chunks):
    output_file = f'small_chunk_{i + 1}.json'
    with open(output_file, 'w') as f:
        json.dump(chunk, f, indent=4)

print("Splitting complete.")

In this example, large_file.json is the input large JSON file you want to split. The chunk_list() function is a generator that splits a list into smaller chunks of a specified size.

Adjust the chunk_size variable according to how you want to split the data. Smaller chunks are better if memory is a concern. Each chunk will be written to separate JSON files like small_chunk_1.json, small_chunk_2.json, and so on.

Keep in mind that this example assumes that your JSON data is a list of dictionaries. If your JSON data has a different structure, you might need to modify the code accordingly.

Also, make sure to handle any exceptions that might occur during file reading and writing to ensure your code is robust.

Examples

How to split a large JSON file into smaller files in Python?

Description: This query demonstrates how to split a large JSON file into smaller chunks.

Code:

# Create a large sample JSON file
echo '{"data": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json

import json

# Load the large JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Define the size of each chunk
chunk_size = 3
data_list = data["data"]

# Split into smaller chunks
chunks = [data_list[i:i + chunk_size] for i in range(0, len(data_list), chunk_size)]

# Write each chunk to a separate file
for i, chunk in enumerate(chunks):
    with open(f'chunk_{i}.json', 'w') as f:
        json.dump({"data": chunk}, f)

print("Split into chunks:", len(chunks))

How to split a large JSON file based on a key in Python?

Description: This query demonstrates splitting a large JSON file based on a specific key.

Code:

# Create a JSON file with multiple records
echo '{"records": [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}]}' > large.json

# Load the large JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Split based on the "id" key, creating a separate file for each record
for record in data["records"]:
    with open(f'record_{record["id"]}.json', 'w') as f:
        json.dump(record, f)

print("Split into individual records")

How to split a large JSON file into smaller files by line in Python?

Description: This query shows how to split a large JSON file into smaller files based on the number of lines.

Code:

# Create a large sample JSON file with multiple lines
echo '{"line1": "data1"}' > large.json
echo '{"line2": "data2"}' >> large.json
echo '{"line3": "data3"}' >> large.json

# Define the chunk size by number of lines
chunk_size = 2

# Read the large JSON file
with open('large.json', 'r') as f:
    lines = f.readlines()

# Split into smaller chunks based on line count
chunks = [lines[i:i + chunk_size] for i in range(0, len(lines), chunk_size)]

# Write each chunk to a separate file
for i, chunk in enumerate(chunks):
    with open(f'chunk_{i}.json', 'w') as f:
        f.writelines(chunk)

print("Split into chunks by lines")

How to split a large JSON array into smaller files in Python?

Description: This query demonstrates splitting a large JSON array into smaller files.

Code:

# Create a large JSON array
echo '{"array": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json

# Split a large JSON array into smaller files
with open('large.json', 'r') as f:
    data = json.load(f)

array = data["array"]
chunk_size = 3

# Split into smaller chunks
chunks = [array[i:i + chunk_size] for i in range(0, len(array), chunk_size)]

for i, chunk in enumerate(chunks):
    with open(f'array_chunk_{i}.json', 'w') as f:
        json.dump({"array": chunk}, f)

print("Split JSON array into smaller files")

How to split a large JSON file into smaller files based on key-value pairs in Python?

Description: This query demonstrates splitting a large JSON file into smaller files based on unique key-value pairs.

Code:

# Create a JSON file with multiple key-value pairs
echo '{"items": [{"type": "A", "value": 1}, {"type": "B", "value": 2}, {"type": "A", "value": 3}]}' > large.json

import collections

# Load the JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Group by key-value pairs
groups = collections.defaultdict(list)
for item in data["items"]:
    key = item["type"]
    groups[key].append(item)

# Write each group to a separate file
for key, items in groups.items():
    with open(f'group_{key}.json', 'w') as f:
        json.dump({"items": items}, f)

print("Split JSON based on key-value pairs")

How to split a large JSON file into smaller files based on a specific condition in Python?

Description: This query demonstrates splitting a JSON file into smaller files based on a specific condition.

Code:

# Create a JSON file with various values
echo '{"data": [{"id": 1, "value": 10}, {"id": 2, "value": 20}, {"id": 3, "value": 30}]}' > large.json

# Load the JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Define a condition for splitting
threshold = 20
above_threshold = [d for d in data["data"] if d["value"] > threshold]
below_threshold = [d for d in data["data"] if d["value"] <= threshold]

# Write each subset to a separate file
with open('above_threshold.json', 'w') as f:
    json.dump({"data": above_threshold}, f)

with open('below_threshold.json', 'w') as f:
    json.dump({"data": below_threshold}, f)

print("Split JSON based on a condition")

How to split a large JSON file into smaller files with incremental naming in Python?

Description: This query demonstrates splitting a large JSON file into smaller files with incremental naming.

Code:

# Create a JSON file with a large list
echo '{"list": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json

# Load the JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Split into smaller files with incremental naming
list_data = data["list"]
chunk_size = 3
chunks = [list_data[i:i + chunk_size] for i in range(0, len(list_data), chunk_size)]

for idx, chunk in enumerate(chunks):
    with open(f'split_{idx}.json', 'w') as f:
        json.dump({"list": chunk}, f)

print("Split JSON into smaller files with incremental naming")

How to handle and split large JSON files with nested structures in Python?

Description: This query explains how to split large JSON files with nested structures into smaller files.

Code:

# Create a JSON file with nested structures
echo '{"data": [{"group": {"id": 1, "name": "Group A"}}, {"group": {"id": 2, "name": "Group B"}}]}' > large.json

# Load the JSON file with nested structures
with open('large.json', 'r') as f:
    data = json.load(f)

# Extract and split based on nested structures
groups = data["data"]
chunk_size = 1
chunks = [groups[i:i + chunk_size] for i in range(0, len(groups), chunk_size)]

for idx, chunk in enumerate(chunks):
    with open(f'group_split_{idx}.json', 'w') as f:
        json.dump({"data": chunk}, f)

print("Split JSON with nested structures")

How to split large JSON files by keys and save to multiple files in Python?

Description: This query demonstrates splitting JSON files into smaller files based on specific keys.

Code:

# Create a JSON file with multiple keys
echo '{"group1": [1, 2, 3], "group2": [4, 5, 6], "group3": [7, 8, 9]}' > large.json

# Load the JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Split into smaller files based on keys
for key, value in data.items():
    with open(f'{key}.json', 'w') as f:
        json.dump({key: value}, f)

print("Split JSON by keys")

How to split large JSON files by time-based data and save to multiple files in Python?

Description: This query demonstrates splitting large JSON files into smaller files based on time-based data.

Code:

# Create a JSON file with time-based data
echo '{"events": [{"timestamp": "2023-01-01", "event": "start"}, {"timestamp": "2023-01-02", "event": "end"}]}' > large.json

# Load the JSON file
with open('large.json', 'r') as f:
    data = json.load(f)

# Split into smaller files based on time-based data
chunks = {}
for event in data["events"]:
    date = event["timestamp"]
    if date not in chunks:
        chunks[date] = []
    chunks[date].append(event)

for date, events in chunks.items():
    with open(f'events_{date}.json', 'w') as f:
        json.dump({"events": events}, f)

print("Split JSON by time-based data")

More Tags

integer loadimage signing guzzle missingmethodexception event-bubbling rigid-bodies jquery-ui-draggable strikethrough email-ext

Split a large json file into multiple smaller files in python

Examples

More Tags

More Python Questions

More Biology Calculators

More Date and Time Calculators

More Genetics Calculators

More Fitness Calculators

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators