Problem Statement
How do you process a multi-gigabyte file efficiently in Python?
Explanation
Stream data instead of loading it all. Iterate line by line, or read fixed-size chunks in binary for parsing. Keep memory usage stable and perform work incrementally. Use generators to pipe data through steps.
Tune buffering, avoid per-line string concatenation, and batch writes. If I O is the bottleneck, consider gzip streams, multiprocessing for CPU-bound parsing, or external tools for pre-filtering.
Code Solution
SolutionRead Only
def chunks(f, size=1024*1024):
while True:
b=f.read(size)
if not b: break
yield b
with open('big.bin','rb') as f:
for c in chunks(f):
process(c)