Articles by Luojie D

The simple idea of calculating hourly total precipitation data from MPAS is that, start by calculating the differences in the rainc and rainnc fields, summing them up, and interpolate onto a lon-lat grid. (Assuming the diag files are output hourly)

3C's: Calculate diffs, Concatenate and Convert

1. First, we do the diffs using cdo.

Added parallel running support using python's subprocess.

In <project>/public/precip/call_precip.py:

#!/usr/bin/env python3

import sys
from datetime import datetime, timedelta
import subprocess
from concurrent.futures import ProcessPoolExecutor, as_completed
import os

def run_cdo_task(t1, t2):
    file1 = f"diag.{t1.strftime('%Y-%m-%d_%H')}.00.00.nc"
    file2 = f"diag.{t2.strftime('%Y-%m-%d_%H')}.00.00.nc"
    outfile = f"diff_{t2.strftime('%Y-%m-%d_%H')}.nc"

    print(f"[{os.getpid()}] Processing {file1} -> {file2} => {outfile}")
    cmd = [
        "cdo", "-b", "F32", "-selname,rainc,rainnc",
        "-sub", file2, file1, outfile
    ]

    try:
        subprocess.run(cmd, check=True)
        return f"✓ {outfile} done"
    except subprocess.CalledProcessError as e:
        return f"✗ Failed: {file1} / {file2} ({e})"

def generate_task_times(start_str, end_str):
    start = datetime.strptime(start_str, "%Y-%m-%d_%H")
    end = datetime.strptime(end_str, "%Y-%m-%d_%H")

    tasks = []
    current = start
    while current + timedelta(hours=1) <= end:
        tasks.append((current, current + timedelta(hours=1)))
        current += timedelta(hours=1)
    return tasks

def main(start_str, end_str, max_workers=None):
    tasks = generate_task_times(start_str, end_str)

    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        future_to_time = {executor.submit(run_cdo_task, t1, t2): (t1, t2) for t1, t2 in tasks}
        for future in as_completed(future_to_time):
            print(future.result())

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: ./calc_precip.py YYYY-MM-DD_HH YYYY-MM-DD_HH [max_workers]")
        sys.exit(1)

    start_time = sys.argv[1]
    end_time = sys.argv[2]
    max_workers = int(sys.argv[3]) if len(sys.argv) > 3 else None

    main(start_time, end_time, max_workers)

After this, we also need to

2. concat them using ncrcat tool in nco.

I've prepared a useful script to look for all the files and concat them.

In <project>/public/precip/concat_precip_v2.py:

#!/usr/bin/env python3
import os
import re
import argparse
from datetime import datetime, timedelta
import subprocess

parser = argparse.ArgumentParser(description="Concatenate NetCDF files with 1-hour intervals using ncrcat.")
parser.add_argument("output", help="Name of the output NetCDF file")
args = parser.parse_args()
output_file = args.output

pattern = re.compile(r"diff_(\d{4}-\d{2}-\d{2})_(\d{2})\.nc") # matches files like diff_2000-01-01_00.nc

files = [f for f in os.listdir('.') if pattern.match(f)]

file_datetimes = []
file_map = {}
for f in files:
    match = pattern.match(f)
    if match:
        date_str, hour_str = match.groups()
        dt = datetime.strptime(f"{date_str} {hour_str}", "%Y-%m-%d %H")
        file_datetimes.append(dt)
        file_map[dt] = f

if not file_datetimes:
    print("❌ No matching files found.")
    exit(1)

file_datetimes.sort()
missing = []
for i in range(1, len(file_datetimes)):
    expected = file_datetimes[i - 1] + timedelta(hours=1)
    if file_datetimes[i] != expected:
        dt = expected
        while dt < file_datetimes[i]:
            missing.append(dt)
            dt += timedelta(hours=1)

if missing:
    print("⚠️ Warning: Missing files for the following datetimes:")
    for m in missing:
        print("  ", m.strftime("%Y-%m-%d %H:%M"))
else:
    print("✅ All hours are continuous.")

input_files = [file_map[dt] for dt in file_datetimes]
try:
    subprocess.run(["ncrcat", "-O"] + input_files + [output_file], check=True)
except subprocess.CalledProcessError:
    print("❌ ncrcat failed. Make sure NCO is installed and files are valid.")
    exit(2)

# Report
print(f"\nStart: {file_datetimes[0]}")
print(f"End:   {file_datetimes[-1]}")
print(f"Output file written: {output_file}")
The job scripts to run for 1. and 2., <project>/all_calc_precip.sh and <project>/all_concat_precip.sh are generated using <notebooks>/.../figs/precip/Batch calc precip.ipynb. These are trivial and contain experiment information, thus are not disclosed.

3. convert_mpas

After Calculating the diffs and Concatenating the process, step 3 is to Convert onto lon-lat grid. Use the script <project>/precip/convert-latlon.sh:

#!/bin/bash
# Note: not to be submitted. Load the modules first and run it interactively.

for file in *precip.nc; do
    # Skip if no .nc file exists
    [ -e "$file" ] || continue

    echo "Processing $file..."

    convert_mpas ../EXAMPLE.static.nc  "$file"

    # Check if latlon.nc was created
    if [ -f latlon.nc ]; then
        # Extract base name without extension
        base="${file%.nc}"
        # Rename latlon.nc to something like originalfilename_latlon.nc
        mv latlon.nc "${base}_latlon.nc"
        echo "Output saved as ${base}_latlon.nc"
    else
        echo "latlon.nc not found for $file"
    fi
done

Since convert_mpas is called, don't forget to put copies of include_fields and target_domain in the same directory, if you need them.

Now you should get the hourly data of tp.

Published on 2025/08/25
Originally finished on 2024/10/03

Note: This method is not the most efficient, unless you would need large chunks of data. For downloading anything shorter than a whole month, I would actually recommend downloading using the cdsapi module in python as a script or actually use the CDS portal. You can download about 11 days of data on pressure levels in one go, and even more for data on the surface level. (I am planning to write more on that later.) This article is thus provided as-is and for your reference only.

Although NCAR’s repository for ERA5 data in grib was phased out, the data can still be somehow obtained here as for now: TDS Catalog.

ERA5 has different categories like surface (SFC) and pressure level (PL). They do not need to be separated to two streams of initial conditions.

Also, the variable LANDSEA which is a binary mask, is located in ERA5’s invariant category. If it is not downloaded and ungrib-ed, MPAS will complain:

ERROR: ********************************************************************************
ERROR: LANDSEA field not found in meteorological data file ERA5PL:2023-04-22_00
CRITICAL ERROR: ********************************************************************************
Logging complete.  Closing file at 2024/10/03 14:33:39

Check the Vtable for ERA5:

GRIB | Level| Level| Level| metgrid  |  metgrid | metgrid                                  |
Code | Code |   1  |   2  | Name     |  Units   | Description                              |
-----+------+------+------+----------+----------+------------------------------------------+
 129 | 100  |   *  |      | GEOPT    | m2 s-2   |                                          |
     | 100  |   *  |      | HGT      | m        | Height                                   |
 130 | 100  |   *  |      | TT       | K        | Temperature                              |
 131 | 100  |   *  |      | UU       | m s-1    | U                                        |
 132 | 100  |   *  |      | VV       | m s-1    | V                                        |
 157 | 100  |   *  |      | RH       | %        | Relative Humidity                        |
 165 |  1   |   0  |      | UU       | m s-1    | U                                        | At 10 m
 166 |  1   |   0  |      | VV       | m s-1    | V                                        | At 10 m
 167 |  1   |   0  |      | TT       | K        | Temperature                              | At 2 m
 168 |  1   |   0  |      | DEWPT    | K        |                                          | At 2 m
     |  1   |   0  |      | RH       | %        | Relative Humidity                        | At 2 m
 172 |  1   |   0  |      | LANDSEA  | 0/1 Flag | Land/Sea flag                            |
 129 |  1   |   0  |      | SOILGEO  | m2 s-2   |                                          |
...

GRIB Code for Land/Sea flag is 172.

{A87F2431-62FD-4C68-843F-DFCF7B561B92}.png

(https://thredds.rda.ucar.edu/thredds/catalog/files/g/d633000/e5.oper.invariant/197901/catalog.html)

It is in the invariant field.

Published on 2025/08/25
Originally finished on 2024/10/01

TL;DR: Use the existing modules on NSCC.

Note: NSCC is a local supercomputer. If your supercomputer does not have wps out-of-the-box or as a module, you might need to download and build it.

Get WPS

Don’t rebuild the wheel.

module load wps
module load wrf
cp -r /app/apps/wps <RELEVANT_PATH>

Then edit the namelist file.

For Vtable, refer to this website: WRF - Free Data.

Use WPS

First, use link_grib followed by grib files. It will generate files like GRIBFILE.AAA.

Then, edit the namelist.wps accordingly. Mainly, look for the start and end datetime and the file interval, and also the filename prefix in the &ungrib section.

Then, use ungrib.exe. If it complains, fix according to the error prompt.

ungrib.exe will look into GRIBFILES one by one and try to extract useful fields. Since not all GRIBFILE might be relevant, it is normal to see empty lines, something like

Inventory for date = 20XX-XX-XX XX:00:00

PRES   GEOPT    HGT      TT       UU       VV       RH       DEWPT    LANDSEA  SOILGEO  SOILHGT  PSFC     PMSL     SKINTEMP SEAICE   SST      SNOW_DEN SNOW_EC  SNOW     SNOWH    ST000007 ST007028 ST028100 ST100289 SM000007 SM007028 SM028100 SM100289
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------