Astrophysics Papers: Daily Summaries Notebook

Fetch today’s astro-ph papers from arXiv, summarize abstracts with llm, and output a Markdown summary.

This is an example of using the Jetstream Inference Service, notice that you need first to configure the llm package to access the Jetstream Inference Service via the API

# Install required packages
#!pip install llm requests

import datetime
import requests
import xml.etree.ElementTree as ET
import llm

# Get default LLM model (uses configured default, e.g. deepseek) citeturn0search0
model = llm.get_model()

# Define system prompt for concise summaries citeturn0search0
system_prompt = (
    'You are an expert summarization assistant. '
    'Provide a single concise sentence capturing the main result of an astrophysics abstract.'
)

# Fetch today's astro-ph submissions from arXiv API

# Download and parse today's astro-ph RSS feed from arXiv, take only the latest 10 papers
rss_url = "https://rss.arxiv.org/rss/astro-ph"
res = requests.get(rss_url)
root = ET.fromstring(res.content)
items = root.find('channel').findall('item')[:3]

papers = []
for item in items:
    title = item.findtext('title', default='')
    link = item.findtext('link', default='')
    desc = item.findtext('description', default='')
    author = item.findtext('{http://purl.org/dc/elements/1.1/}creator', default='')
    # Extract abstract from description (after 'Abstract:')
    abstract = ''
    if 'Abstract:' in desc:
        abstract = desc.split('Abstract:', 1)[1].strip()
    papers.append({
        'title': title,
        'link': link,
        'abstract': abstract,
        'author': author,
        'inst': ''
    })

from tqdm.notebook import tqdm
import re

today = datetime.date.today().strftime('%Y-%m-%d')
# Summarize abstracts and build Markdown content citeturn0search0
lines = [f'# Astrophysics Papers for {today}\n']
for p in tqdm(papers, desc="Summarizing papers"):
    resp = model.prompt(p['abstract'], system=system_prompt)
    # Remove any <think>...</think> blocks from the response text
    summary = re.sub(r'<think>.*?</think>', '', resp.text(), flags=re.DOTALL).strip()
    lines.append(
        f"## {p['title']}\n"
        f"- **Author:** {p['author']}\n"
        f"- **Link:** {p['link']}\n"
        f"**Summary:** {summary}\n"
    )
md = '\n'.join(lines)

# Save Markdown summary
fn = f'astro_ph_summaries_{today}.md'
with open(fn, 'w') as f:
    f.write(md)
print(f'Saved summary to {fn}')

Saved summary to astro_ph_summaries_2025-05-08.md

!cat $fn

# Astrophysics Papers for 2025-05-08

## Machine Learning Workflow for Morphological Classification of Galaxies
- **Author:** Bernd Doser, Kai L. Polsterer, Andreas Fehlner, Sebastian Trujillo-Gomez
- **Institution:** 
- **Link:** https://arxiv.org/abs/2505.04676
**Summary:** The study presents a reproducible, scalable machine-learning workflow leveraging open-source tools and FAIR principles to efficiently analyze exascale astrophysical simulations, enabling collaborative exploration of galaxy morphologies.

## A data-driven approach for star formation parameterization using symbolic regression
- **Author:** Diane M. Salim, Matthew E. Orr, Blakesley Burkhart, Rachel S. Somerville, Miles Cramner
- **Institution:** 
- **Link:** https://arxiv.org/abs/2505.04681
**Summary:** Machine learning-driven symbolic regression applied to FIRE-2 simulations reveals that star formation rate surface density at 100 Myr scales robustly with gas surface density, velocity dispersion, and stellar surface density, converging to physically interpretable Kennicutt-Schmidt-like relations that capture intrinsic scatter.

## Reassessing the ZTF Volume-Limited Type Ia Supernova Sample and Its Implications for Continuous, Dust-Dependent Models of Intrinsic Scatter
- **Author:** Yukei S. Murakami, Daniel Scolnic
- **Institution:** 
- **Link:** https://arxiv.org/abs/2505.04686
**Summary:** The re-analysis of the ZTF DR2 Type Ia supernovae sample using a continuous, color-dependent model (Host2D) supports the dust hypothesis for luminosity diversity, revealing methodological differences—not sample properties—as the source of prior conflicts, with a 4.0σ improvement over previous models.