2025-8

The simple idea of calculating hourly total precipitation data from MPAS is that, start by calculating the differences in the rainc and rainnc fields, summing them up, and interpolate onto a lon-lat grid. (Assuming the diag files are output hourly)

3C's: Calculate diffs, Concatenate and Convert

1. First, we do the diffs using cdo.

Added parallel running support using python's subprocess.

In <project>/public/precip/call_precip.py:

#!/usr/bin/env python3

import sys
from datetime import datetime, timedelta
import subprocess
from concurrent.futures import ProcessPoolExecutor, as_completed
import os

def run_cdo_task(t1, t2):
    file1 = f"diag.{t1.strftime('%Y-%m-%d_%H')}.00.00.nc"
    file2 = f"diag.{t2.strftime('%Y-%m-%d_%H')}.00.00.nc"
    outfile = f"diff_{t2.strftime('%Y-%m-%d_%H')}.nc"

    print(f"[{os.getpid()}] Processing {file1} -> {file2} => {outfile}")
    cmd = [
        "cdo", "-b", "F32", "-selname,rainc,rainnc",
        "-sub", file2, file1, outfile
    ]

    try:
        subprocess.run(cmd, check=True)
        return f"✓ {outfile} done"
    except subprocess.CalledProcessError as e:
        return f"✗ Failed: {file1} / {file2} ({e})"

def generate_task_times(start_str, end_str):
    start = datetime.strptime(start_str, "%Y-%m-%d_%H")
    end = datetime.strptime(end_str, "%Y-%m-%d_%H")

    tasks = []
    current = start
    while current + timedelta(hours=1) <= end:
        tasks.append((current, current + timedelta(hours=1)))
        current += timedelta(hours=1)
    return tasks

def main(start_str, end_str, max_workers=None):
    tasks = generate_task_times(start_str, end_str)

    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        future_to_time = {executor.submit(run_cdo_task, t1, t2): (t1, t2) for t1, t2 in tasks}
        for future in as_completed(future_to_time):
            print(future.result())

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: ./calc_precip.py YYYY-MM-DD_HH YYYY-MM-DD_HH [max_workers]")
        sys.exit(1)

    start_time = sys.argv[1]
    end_time = sys.argv[2]
    max_workers = int(sys.argv[3]) if len(sys.argv) > 3 else None

    main(start_time, end_time, max_workers)

After this, we also need to

2. concat them using ncrcat tool in nco.

I've prepared a useful script to look for all the files and concat them.

In <project>/public/precip/concat_precip_v2.py:

#!/usr/bin/env python3
import os
import re
import argparse
from datetime import datetime, timedelta
import subprocess

parser = argparse.ArgumentParser(description="Concatenate NetCDF files with 1-hour intervals using ncrcat.")
parser.add_argument("output", help="Name of the output NetCDF file")
args = parser.parse_args()
output_file = args.output

pattern = re.compile(r"diff_(\d{4}-\d{2}-\d{2})_(\d{2})\.nc") # matches files like diff_2000-01-01_00.nc

files = [f for f in os.listdir('.') if pattern.match(f)]

file_datetimes = []
file_map = {}
for f in files:
    match = pattern.match(f)
    if match:
        date_str, hour_str = match.groups()
        dt = datetime.strptime(f"{date_str} {hour_str}", "%Y-%m-%d %H")
        file_datetimes.append(dt)
        file_map[dt] = f

if not file_datetimes:
    print("❌ No matching files found.")
    exit(1)

file_datetimes.sort()
missing = []
for i in range(1, len(file_datetimes)):
    expected = file_datetimes[i - 1] + timedelta(hours=1)
    if file_datetimes[i] != expected:
        dt = expected
        while dt < file_datetimes[i]:
            missing.append(dt)
            dt += timedelta(hours=1)

if missing:
    print("⚠️ Warning: Missing files for the following datetimes:")
    for m in missing:
        print("  ", m.strftime("%Y-%m-%d %H:%M"))
else:
    print("✅ All hours are continuous.")

input_files = [file_map[dt] for dt in file_datetimes]
try:
    subprocess.run(["ncrcat", "-O"] + input_files + [output_file], check=True)
except subprocess.CalledProcessError:
    print("❌ ncrcat failed. Make sure NCO is installed and files are valid.")
    exit(2)

# Report
print(f"\nStart: {file_datetimes[0]}")
print(f"End:   {file_datetimes[-1]}")
print(f"Output file written: {output_file}")
The job scripts to run for 1. and 2., <project>/all_calc_precip.sh and <project>/all_concat_precip.sh are generated using <notebooks>/.../figs/precip/Batch calc precip.ipynb. These are trivial and contain experiment information, thus are not disclosed.

3. convert_mpas

After Calculating the diffs and Concatenating the process, step 3 is to Convert onto lon-lat grid. Use the script <project>/precip/convert-latlon.sh:

#!/bin/bash
# Note: not to be submitted. Load the modules first and run it interactively.

for file in *precip.nc; do
    # Skip if no .nc file exists
    [ -e "$file" ] || continue

    echo "Processing $file..."

    convert_mpas ../EXAMPLE.static.nc  "$file"

    # Check if latlon.nc was created
    if [ -f latlon.nc ]; then
        # Extract base name without extension
        base="${file%.nc}"
        # Rename latlon.nc to something like originalfilename_latlon.nc
        mv latlon.nc "${base}_latlon.nc"
        echo "Output saved as ${base}_latlon.nc"
    else
        echo "latlon.nc not found for $file"
    fi
done

Since convert_mpas is called, don't forget to put copies of include_fields and target_domain in the same directory, if you need them.

Now you should get the hourly data of tp.