9  Case Study: From Notebook to Package with nbdev

NoteChapter Overview

This case study parallels Chapter 4 (SimpleBot), but follows a notebook-first workflow. We’ll build TextKit—a text analysis library—entirely in Jupyter notebooks, then ship it as a published Python package using nbdev.

9.1 Project Overview

TextKit is a lightweight text analysis library that provides simple utilities for analyzing text. Key features include:

  • Word and character statistics
  • Readability scoring (Flesch-Kincaid, etc.)
  • Basic sentiment indicators
  • Text cleaning utilities

This project is ideal for our notebook case study because:

  • Natural notebook fit: Text analysis involves exploration and visualization
  • Keeps the theme: Complements SimpleBot’s chatbot focus (analyzing what bots produce)
  • Real utility: Functions you’d actually use in data analysis
  • Right size: Small enough to complete, complex enough to demonstrate the workflow

By the end of this chapter, you’ll have a package published to PyPI—built entirely from notebooks.

9.2 Why nbdev for This Project?

In Chapter 8, we introduced nbdev as a way to develop libraries from notebooks. Here’s why it fits TextKit:

Traditional Workflow nbdev Workflow
Write code in .py files Write code in notebooks
Write separate test files Tests live next to code
Write docs separately Docs generated from notebooks
Context switching Single environment

For exploratory, iterative work like text analysis, nbdev keeps everything together.

9.3 1. Setting Up the nbdev Project

9.3.1 Installing nbdev

pip install nbdev

9.3.2 Creating the Project

nbdev_new --lib_name textkit --user yourusername --author "Your Name"
cd textkit

This creates:

textkit/
├── nbs/                    # Your notebooks live here
│   ├── 00_core.ipynb       # Main module
│   ├── index.ipynb         # Becomes README and docs homepage
│   └── _quarto.yml         # Documentation config
├── textkit/                # Generated Python package (don't edit directly)
├── settings.ini            # Project configuration
├── setup.py                # Generated for pip install
└── pyproject.toml

9.3.3 Key Insight: You Edit Notebooks, Not .py Files

The textkit/ directory contains generated code. Your source of truth is nbs/*.ipynb.

9.4 2. Building the Core Module

9.4.1 The First Notebook: 00_core.ipynb

Open nbs/00_core.ipynb in Jupyter. The structure:

# Cell 1: Module header
#| default_exp core

This directive tells nbdev: “export cells from this notebook to textkit/core.py”.

9.4.2 Exporting Functions

#| export
def word_count(text: str) -> int:
    """Count words in text.

    Parameters
    ----------
    text : str
        Input text to analyze

    Returns
    -------
    int
        Number of words

    Examples
    --------
    >>> word_count("Hello world")
    2
    >>> word_count("")
    0
    """
    if not text or not text.strip():
        return 0
    return len(text.split())

The #| export directive marks this cell for inclusion in the generated module.

9.4.3 Exploring as You Build

This is where notebooks shine. Between exported cells, add exploration:

# Not exported - just exploration
sample_text = """
The quick brown fox jumps over the lazy dog.
This is a sample paragraph for testing our text analysis functions.
"""

print(f"Word count: {word_count(sample_text)}")

Your notebook becomes both implementation AND documentation of your thinking.

9.5 3. Adding Tests with nbdev

9.5.1 Inline Doctests

The docstring examples above ARE tests. nbdev runs them automatically:

nbdev_test

9.5.2 Dedicated Test Cells

For more complex tests:

#| test
def test_word_count_edge_cases():
    assert word_count("") == 0
    assert word_count("   ") == 0
    assert word_count("one") == 1
    assert word_count("one two three") == 3
    # Unicode handling
    assert word_count("café résumé") == 2

9.5.3 Running Tests

# Run all tests
nbdev_test

# Run tests for specific notebook
nbdev_test --path nbs/00_core.ipynb

9.6 4. Building More Functionality

9.6.1 Readability Scores

#| export
def flesch_reading_ease(text: str) -> float:
    """Calculate Flesch Reading Ease score.

    Scores typically range from 0-100:
    - 90-100: Very easy (5th grade)
    - 60-70: Standard (8th-9th grade)
    - 0-30: Very difficult (college graduate)

    Examples
    --------
    >>> score = flesch_reading_ease("The cat sat on the mat.")
    >>> 90 <= score <= 120  # Simple sentence = high score
    True
    """
    words = word_count(text)
    sentences = sentence_count(text)
    syllables = syllable_count(text)

    if words == 0 or sentences == 0:
        return 0.0

    return (
        206.835
        - 1.015 * (words / sentences)
        - 84.6 * (syllables / words)
    )

9.6.2 Helper Functions

#| export
def sentence_count(text: str) -> int:
    """Count sentences in text.

    Examples
    --------
    >>> sentence_count("Hello. World!")
    2
    >>> sentence_count("No punctuation here")
    1
    """
    import re
    if not text.strip():
        return 0
    # Split on sentence-ending punctuation
    sentences = re.split(r'[.!?]+', text)
    # Filter empty strings
    return len([s for s in sentences if s.strip()])
#| export
def syllable_count(text: str) -> int:
    """Estimate syllable count (English approximation).

    Examples
    --------
    >>> syllable_count("hello")
    2
    >>> syllable_count("beautiful")
    4
    """
    import re
    text = text.lower()
    words = text.split()

    count = 0
    for word in words:
        word = re.sub(r'[^a-z]', '', word)
        if not word:
            continue
        # Simple heuristic: count vowel groups
        syllables = len(re.findall(r'[aeiouy]+', word))
        # Adjust for silent e
        if word.endswith('e') and syllables > 1:
            syllables -= 1
        count += max(1, syllables)

    return count

9.7 5. Visualizations in Your Notebook

Notebooks excel at visual exploration. Add analysis cells (not exported):

# Visualization - not exported, but shows in docs
import matplotlib.pyplot as plt

def visualize_readability(texts: dict[str, str]):
    """Compare readability across multiple texts."""
    names = list(texts.keys())
    scores = [flesch_reading_ease(t) for t in texts.values()]

    plt.figure(figsize=(10, 5))
    plt.barh(names, scores, color='steelblue')
    plt.xlabel('Flesch Reading Ease Score')
    plt.title('Readability Comparison')
    plt.axvline(x=60, color='red', linestyle='--', label='Standard difficulty')
    plt.legend()
    plt.tight_layout()
    plt.show()

# Demo with sample texts
samples = {
    "Children's book": "The cat sat. The dog ran. They played.",
    "News article": "The committee announced sweeping regulatory changes affecting multiple industries.",
    "Academic paper": "The epistemological ramifications of quantum indeterminacy necessitate reconceptualization.",
}

visualize_readability(samples)

This visualization appears in your generated documentation—showing users what the library can do.

9.8 6. Building the Text Analyzer Class

For a more complete API, add a class that combines functionality:

#| export
class TextAnalyzer:
    """Analyze text with multiple metrics.

    Examples
    --------
    >>> analyzer = TextAnalyzer("Hello world. How are you?")
    >>> analyzer.word_count
    5
    >>> analyzer.sentence_count
    2
    """

    def __init__(self, text: str):
        self.text = text
        self._word_count = None
        self._sentence_count = None

    @property
    def word_count(self) -> int:
        if self._word_count is None:
            self._word_count = word_count(self.text)
        return self._word_count

    @property
    def sentence_count(self) -> int:
        if self._sentence_count is None:
            self._sentence_count = sentence_count(self.text)
        return self._sentence_count

    @property
    def avg_words_per_sentence(self) -> float:
        if self.sentence_count == 0:
            return 0.0
        return self.word_count / self.sentence_count

    @property
    def readability(self) -> float:
        return flesch_reading_ease(self.text)

    def summary(self) -> dict:
        """Return all metrics as a dictionary."""
        return {
            "words": self.word_count,
            "sentences": self.sentence_count,
            "avg_words_per_sentence": round(self.avg_words_per_sentence, 1),
            "flesch_reading_ease": round(self.readability, 1),
        }

9.9 7. Adding an Interactive Widget

End with something users can interact with—demonstrating the notebook as an application:

# Interactive demo (not exported - for notebook/docs only)
import ipywidgets as widgets
from IPython.display import display

def create_analyzer_widget():
    """Create an interactive text analyzer."""

    text_input = widgets.Textarea(
        value='Enter your text here...',
        placeholder='Paste text to analyze',
        description='Text:',
        layout=widgets.Layout(width='100%', height='150px')
    )

    output = widgets.Output()

    def analyze(change):
        output.clear_output()
        with output:
            if text_input.value.strip():
                analyzer = TextAnalyzer(text_input.value)
                results = analyzer.summary()
                print("📊 Analysis Results")
                print("-" * 30)
                for key, value in results.items():
                    print(f"{key.replace('_', ' ').title()}: {value}")

    text_input.observe(analyze, names='value')

    display(widgets.VBox([
        widgets.HTML("<h3>📝 Text Analyzer</h3>"),
        text_input,
        output
    ]))

# Show the widget
create_analyzer_widget()

When viewed in Colab or Binder, users can interact with your library without installing anything.

9.10 8. Generating the Package

9.10.1 Export to Python Modules

nbdev_export

This generates textkit/core.py from your notebook’s #| export cells.

9.10.2 Verify Everything Works

# Run tests
nbdev_test

# Check for issues
nbdev_clean
nbdev_prepare

9.10.3 The Generated Code

Look at textkit/core.py—it contains clean Python code generated from your notebooks, with proper imports and structure.

9.11 9. Documentation

9.11.1 The Index Notebook

nbs/index.ipynb becomes both your README.md and documentation homepage. Include:

  1. Installation instructions
  2. Quick start example
  3. Feature overview
# In nbs/index.ipynb

# TextKit

> Simple text analysis for Python

## Installation

```bash
pip install textkit

9.12 Quick Start

from textkit.core import TextAnalyzer

text = "Your text here. Analyze it easily."
analyzer = TextAnalyzer(text)
print(analyzer.summary())

### Build Documentation

```bash
nbdev_docs

This generates a Quarto-based documentation site in _docs/.

9.13 10. Publishing to PyPI

9.13.1 Prepare for Release

# Clean and prepare
nbdev_prepare

# Build distribution
python -m build

9.13.2 Publish

# Test PyPI first
twine upload --repository testpypi dist/*

# Then real PyPI
twine upload dist/*

9.13.3 The Result

pip install textkit

You’ve shipped a Python package—developed entirely in notebooks.

9.14 11. Sharing the Notebook Itself

Beyond the package, share the development notebook:

9.14.1 Colab Badge

[![Open In Colab](BADGE)](COLAB_LINK)

Replace BADGE and COLAB_LINK with your repo URLs.

9.14.2 Binder Badge

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/username/textkit/main)

Users can:

  1. Install the package via pip (traditional)
  2. Explore the notebook to understand the code (educational)
  3. Run interactively in Colab/Binder (zero-install)

9.15 Comparing Workflows

Here’s how this case study compares to the SimpleBot approach (Chapter 4):

Aspect SimpleBot (Scripts) TextKit (nbdev)
Source files .py in src/ .ipynb in nbs/
Tests Separate tests/ directory Inline with code
Documentation Separate docs/ Generated from notebooks
Exploration Separate REPL/scratch files Integrated in notebooks
Output Package on PyPI Package on PyPI
Best for Traditional dev, teams Exploratory, teaching

Both workflows produce the same result: a published package. Choose based on how you like to work.

9.16 When to Use This Workflow

The nbdev approach works best when:

  • Exploration is central: You’re figuring things out as you build
  • Teaching matters: Others will learn from your notebooks
  • Docs should show execution: You want live examples in documentation
  • Solo or small team: Git conflicts in notebooks are real

Consider traditional scripts when:

  • Large teams: Notebook diffs are harder to review
  • Complex architecture: Many interconnected modules
  • Heavy IDE reliance: Refactoring tools work better with .py files
  • Existing codebase: Converting to nbdev is non-trivial

9.17 Summary

  • nbdev inverts the workflow: Notebooks are source, .py files are generated
  • Tests live with code: Doctests and #| test cells eliminate context switching
  • Exploration becomes documentation: Your investigative work helps users
  • Same destination: Published package, installable via pip
  • Different journey: Iterative, visual, integrated

9.18 Exercises

  1. Extend TextKit: Add a sentiment_words() function that counts positive/negative words from a simple word list. Include doctests.

  2. Add a notebook: Create 01_advanced.ipynb with functions for text comparison (e.g., similarity between two texts).

  3. Publish to TestPyPI: Go through the full publication workflow to TestPyPI.

  4. Create a Voilà dashboard: Convert the interactive widget section into a standalone Voilà dashboard.

  5. Compare workflows: Take one function from TextKit and rewrite it in the traditional script workflow. Reflect on the differences.