Derive hourly total precipitation from MPAS output
The simple idea of calculating hourly total precipitation data from MPAS is that, start by calculating the differences in the rainc
and rainnc
fields, summing them up, and interpolate onto a lon-lat grid. (Assuming the diag files are output hourly)
3C's: Calculate diffs, Concatenate and Convert
1. First, we do the diffs using cdo
.
Added parallel running support using python's subprocess
.
In <project>/public/precip/call_precip.py
:
#!/usr/bin/env python3
import sys
from datetime import datetime, timedelta
import subprocess
from concurrent.futures import ProcessPoolExecutor, as_completed
import os
def run_cdo_task(t1, t2):
file1 = f"diag.{t1.strftime('%Y-%m-%d_%H')}.00.00.nc"
file2 = f"diag.{t2.strftime('%Y-%m-%d_%H')}.00.00.nc"
outfile = f"diff_{t2.strftime('%Y-%m-%d_%H')}.nc"
print(f"[{os.getpid()}] Processing {file1} -> {file2} => {outfile}")
cmd = [
"cdo", "-b", "F32", "-selname,rainc,rainnc",
"-sub", file2, file1, outfile
]
try:
subprocess.run(cmd, check=True)
return f"✓ {outfile} done"
except subprocess.CalledProcessError as e:
return f"✗ Failed: {file1} / {file2} ({e})"
def generate_task_times(start_str, end_str):
start = datetime.strptime(start_str, "%Y-%m-%d_%H")
end = datetime.strptime(end_str, "%Y-%m-%d_%H")
tasks = []
current = start
while current + timedelta(hours=1) <= end:
tasks.append((current, current + timedelta(hours=1)))
current += timedelta(hours=1)
return tasks
def main(start_str, end_str, max_workers=None):
tasks = generate_task_times(start_str, end_str)
with ProcessPoolExecutor(max_workers=max_workers) as executor:
future_to_time = {executor.submit(run_cdo_task, t1, t2): (t1, t2) for t1, t2 in tasks}
for future in as_completed(future_to_time):
print(future.result())
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: ./calc_precip.py YYYY-MM-DD_HH YYYY-MM-DD_HH [max_workers]")
sys.exit(1)
start_time = sys.argv[1]
end_time = sys.argv[2]
max_workers = int(sys.argv[3]) if len(sys.argv) > 3 else None
main(start_time, end_time, max_workers)
After this, we also need to
2. concat them using ncrcat
tool in nco
.
I've prepared a useful script to look for all the files and concat them.
In <project>/public/precip/concat_precip_v2.py
:
#!/usr/bin/env python3
import os
import re
import argparse
from datetime import datetime, timedelta
import subprocess
parser = argparse.ArgumentParser(description="Concatenate NetCDF files with 1-hour intervals using ncrcat.")
parser.add_argument("output", help="Name of the output NetCDF file")
args = parser.parse_args()
output_file = args.output
pattern = re.compile(r"diff_(\d{4}-\d{2}-\d{2})_(\d{2})\.nc") # matches files like diff_2000-01-01_00.nc
files = [f for f in os.listdir('.') if pattern.match(f)]
file_datetimes = []
file_map = {}
for f in files:
match = pattern.match(f)
if match:
date_str, hour_str = match.groups()
dt = datetime.strptime(f"{date_str} {hour_str}", "%Y-%m-%d %H")
file_datetimes.append(dt)
file_map[dt] = f
if not file_datetimes:
print("❌ No matching files found.")
exit(1)
file_datetimes.sort()
missing = []
for i in range(1, len(file_datetimes)):
expected = file_datetimes[i - 1] + timedelta(hours=1)
if file_datetimes[i] != expected:
dt = expected
while dt < file_datetimes[i]:
missing.append(dt)
dt += timedelta(hours=1)
if missing:
print("⚠️ Warning: Missing files for the following datetimes:")
for m in missing:
print(" ", m.strftime("%Y-%m-%d %H:%M"))
else:
print("✅ All hours are continuous.")
input_files = [file_map[dt] for dt in file_datetimes]
try:
subprocess.run(["ncrcat", "-O"] + input_files + [output_file], check=True)
except subprocess.CalledProcessError:
print("❌ ncrcat failed. Make sure NCO is installed and files are valid.")
exit(2)
# Report
print(f"\nStart: {file_datetimes[0]}")
print(f"End: {file_datetimes[-1]}")
print(f"Output file written: {output_file}")
The job scripts to run for 1. and 2.,<project>/all_calc_precip.sh
and<project>/all_concat_precip.sh
are generated using<notebooks>/.../figs/precip/Batch calc precip.ipynb
. These are trivial and contain experiment information, thus are not disclosed.
3. convert_mpas
After Calculating the diffs and Concatenating the process, step 3 is to Convert onto lon-lat grid. Use the script <project>/precip/convert-latlon.sh
:
#!/bin/bash
# Note: not to be submitted. Load the modules first and run it interactively.
for file in *precip.nc; do
# Skip if no .nc file exists
[ -e "$file" ] || continue
echo "Processing $file..."
convert_mpas ../EXAMPLE.static.nc "$file"
# Check if latlon.nc was created
if [ -f latlon.nc ]; then
# Extract base name without extension
base="${file%.nc}"
# Rename latlon.nc to something like originalfilename_latlon.nc
mv latlon.nc "${base}_latlon.nc"
echo "Output saved as ${base}_latlon.nc"
else
echo "latlon.nc not found for $file"
fi
done
Since convert_mpas
is called, don't forget to put copies of include_fields
and target_domain
in the same directory, if you need them.
Now you should get the hourly data of tp.