A Technical Deep Dive into Budget Apportionment Tracking Using the OpenOMB API

avatar

Last night, working until the very midnight, I finished another tool contributing to what I call a "technoscientific workflow" to make certain public data more digestible and traceable. The data in question concerns the Cuba "democracy" program, inaugurated by Bill Clinton in 1995. Approximately 70% of its information remains either redacted or simply unreported. Yet the available data—accessible through official channels like USASpending.gov, FA.gov, FAC.gov, and civil-society projects like ProPublica and OpenOMB.org—offers a foundation for more critical consumption of content typically presented as disconnected from the political institutions and interests that sponsor it. This is far more nuanced than the account-labeling approaches some platforms have adopted for known state-affiliated media like RT or Xinhua. It requires building the infrastructure to read, process, and interrogate public data—and then present it clearly at the user layer. I believe this can be connected to the Carnegie Mellon University-based social cybersecurity movement.


The Apportionment Problem

The pilot tool I developed enables tracking of the Office of Management and Budget's apportionments derived from annual appropriations acts, specifically as they concern the Cuba program. Within twenty days before the fiscal year start—or thirty days after Congress passes the annual appropriations act, whichever comes later—the Office of Management and Budget (OMB) must authorize the State Department to obligate funds for goods and services. "An obligation cannot be incurred without an OMB-approved apportionment", it reads here.

This apportionment process is mandated by the Anti-Deficiency Act and coordinated with federal agencies, though the OMB retains final decision-making authority. Just four years back, Public Law 117-103 required the OMB to publish apportionment documents on a public website, but the agency fulfills this obligation by posting documents in JSON/Excel formats deliberately obscure to anyone without deep expertise in Treasury nomenclature. The agency has made transparency technically compliant while practically inaccessible.


The Account Architecture

Through fiscal year 2025, Congress guaranteed two-year funding coverage for foreign assistance through accounts like the Economic Support Fund (ESF) and Development Assistance, historically administered by USAID. As of fiscal 2026, these have consolidated into the new National Security Investment Programs (NSIP). The NSIP account comprises the Trump administration's proposed "America First Opportunity Fund," aiming to replace the old foreign assistance structure.

The ESF, identified by Treasury Account Symbol (TAS) "072-1037", was historically the channel through which the OMB apportioned Cuba program funding. The "072" represented USAID, while "1037" identified the ESF line item in each appropriations act. The Treasury Account Fund Symbol (TAFS) combines the TAS with the fiscal biennium it covers. For example, bilateral economic assistance in the 2026 National Security, Department of State, and Related Programs Appropriations Act flows through TAFS "019-1122 2026/2027"—where "019" is the State Department's code, "0122" is the NSIP identifier —new—, and "2026/2027" is the biennium covering $6.766 billion in total appropriations for the account.

Since Cuba program funding typically appears explicitly in appropriations acts, what matters to me as a researcher is tracking the precise financial volumes managed by each federal structure administering the program. This is: USAID—through 2025—, and the State Department's bureaus for Democracy, Human Rights and Labor (DRL) and Western Hemisphere Affairs (WHA). To my knowledge, no Cuba-focused researcher has focused on this particular moment in the US budget cycle. Indeed, no institutional or individual actor has attempted a traceability of the Cuba "democracy" program as comprehensive as what I'm proposing here.


How the Apportionment Process Works for Cuba

At some point in the fiscal (bi)annual cycle, the OMB apportions the total amount Congress allocated—typically $20-25 million—to the corresponding TAFS, say "072-1037 2025/2026," administered by USAID. Then I need to monitor each subsequent iteration of that TAFS to identify outbound transfers. My experience with OpenOMB, which provides a simpler interface for studying this, combined with its API, revealed a clear pattern: every outflow from the TAFS's Cuba line item corresponds to an entry in TAS "072-1037S-019"—the ESF funds administered by Foggy Bottom.

There, I must identify which department received the transfer by observing changes in two specific line items: "WHA Regional Funds" and "Democracy, Human Rights and Labor" (DRL). If the variance is exact, I assign the transfer to that account. But usually these lines receive multiple concurrent transfers. If only one line varies, and the amount received exceeds what left "072-1037", I assign it to that department. When both lines show variation, I apply this threshold-based logic refined through years of data analysis:

  • If transferred amount > $2 million: assign to DRL.

  • If transferred amount ≤ $2 million: assign to WHA.


Building the Tool: LLMs, Python, an API, and Human Domain Knowledge

With support from Claude, Grok, and DeepSeek, I built a Python application that downloads, caches, and processes TAFS documents; searches for the keyword "Cuba"; and determines transfer assignments between DRL and WHA. Currently, the logic is retrospective—I am monitoring how the new "NSIP" TAFS will handle Cuba program funds, if at all. As advanced, I relied on OpenOMB's public API, and gave Claude the task of producing a first version to start with something.

A known lesson: LLMs are powerful, but if you don't understand the process you're trying to automate, and if you don't bring your own programming logic to the task, you won't get optimal results. I discovered critical issues that no model caught automatically. For example: 1) Transfer destination tracking isn't linked by iteration number—it's linked by approval date. Early versions assumed iteration-to-iteration matching. Wrong; and 2) Determining whether WHA or DRL amounts actually changed requires comparing against the previous iteration within that TAFS. There's no explicit "inbound transfer" field. You must reconstruct the logic yourself.


Key Code Sections

1. Fetching TAFS Documents with Year-Inclusive Search

def fetch_tafs_by_year_and_code(year_start, year_end, tafs_code, cache):
    """Searches for TAFS documents using full year-inclusive code"""
    full_code = f"{tafs_code}-{year_start}-{year_end}"
    url = f"{BASE_URL}/files/search?tafs={full_code}&limit=100"
    
    try:
        response = requests.get(url, headers=HEADERS, timeout=30)
        if response.status_code != 200:
            return []
        data = response.json()
        return data.get("results", [])
    except Exception as e:
        print(f"  ❌ Error in {full_code}: {e}")
        return []

2. Extracting Schedule Data from Serialized JSON

def fetch_file_lines(file_id, biennium, cache):
    """
    Downloads apportionment document and extracts ScheduleData.
    OMB stores line-item details in a serialized JSON string within 'sourceData'.
    """
    url = f"{BASE_URL}/files/{file_id}?sourceData=true"
    try:
        response = requests.get(url, headers=HEADERS, timeout=30)
        data = response.json()
        source_data_str = data.get("results", {}).get("sourceData")
        
        if not source_data_str:
            return [], None
        
        # Parse the serialized JSON
        source_data = json.loads(source_data_str)
        lines = source_data.get("ScheduleData", [])
        approval_date = source_data.get("ApprovalTimestamp", "")[:10]
        
        return lines, approval_date
    except Exception as e:
        print(f"      ⚠️ Error downloading {file_id}: {e}")
        return [], None

Critical detail: The OMB API returns sourceData as a serialized JSON string, not a parsed object. You must call json.loads() before accessing ScheduleData[].

3. Cuba Line Detection

def find_cuba_line(lines):
    """Searches ScheduleData for the 'Cuba' line item"""
    for line in lines:
        desc = (line.get("LineDescription", "") or "").lower()
        if "cuba" in desc:
            return {
                "lineNumber": line.get("LineNumber"),
                "description": line.get("LineDescription"),
                "amount": float(line.get("ApprovedAmount", 0))
            }
    return None

4. WHA and DRL Amount Extraction

def find_wha_drl_amounts(lines):
    """Extracts WHA Regional Funds and DRL line amounts"""
    wha = 0.0
    drl = 0.0
    for line in lines:
        desc = (line.get("LineDescription", "") or "").lower()
        amount = float(line.get("ApprovedAmount", 0))
        
        if "wha regional" in desc or "western hemisphere" in desc:
            wha = amount
        elif ("human rights" in desc and "democracy" in desc) or "drl" in desc:
            drl = amount
    
    return {"wha": wha, "drl": drl}

5. Date-Based Transfer Correlation

# Organize origin (072-1037) by approval date
origen_by_date = {}
for file_rec in origen_files:
    file_id = file_rec.get("fileId")
    lines, approval_date = fetch_file_lines(file_id, biennium, cache)
    cuba = find_cuba_line(lines)
    if cuba:
        origen_by_date[approval_date] = cuba["amount"]

# Organize destination (072-1037S-019) by approval date
destino_by_date = {}
for file_rec in destino_files:
    file_id = file_rec.get("fileId")
    lines, approval_date = fetch_file_lines(file_id, biennium, cache)
    amounts = find_wha_drl_amounts(lines)
    destino_by_date[approval_date] = {"wha": amounts["wha"], "drl": amounts["drl"]}

# Detect decreases (transfers)
fechas_origen = sorted(origen_by_date.keys())
for i in range(len(fechas_origen) - 1):
    fecha_prev = fechas_origen[i]
    fecha_curr = fechas_origen[i+1]
    monto_prev = origen_by_date[fecha_prev]
    monto_curr = origen_by_date[fecha_curr]
    
    if monto_curr < monto_prev:
        transferred = monto_prev - monto_curr
        
        # Look up destination file with SAME APPROVAL DATE
        if fecha_curr in destino_by_date:
            dest_amounts = destino_by_date[fecha_curr]
            # Apply assignment rules...

Why This Matters

The Cuba "democracy" program has operated for nearly 30 years with minimal public oversight of its budget mechanics. This pilot app demonstrates that transparency—genuine, usable transparency—requires building tools alongside demanding open data. Civil society must be empowered to read, process, and interrogate such data. By scrutinizing the mechanism by which Cuba-linked funds are apportioned, I believe we move toward more democratic accountability, directly feeding people's resilience in an information jungle.



0
0
0.000
1 comments