{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Bab 8 — Unsupervised + Data Preparation Playground\n",
        "\n",
        "Notebook ini memperluas Bab 8: audit data mentah, cleansing, EDA/plotting, regresi linear untuk insight, k-means, PCA mini, dan anomali.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Definisi fungsi, data mentah, dan helper visualisasi\n",
        "Jalankan sel ini untuk memuat semua fungsi standard library.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#!/usr/bin/env python3\n",
        "\"\"\"Bab 8 — Unsupervised learning + data preparation playground.\n",
        "\n",
        "Kode ini memakai Python standard library agar bisa dijalankan dari terminal,\n",
        "VS Code, Jupyter, Google Colab, dan Kaggle tanpa instalasi tambahan.\n",
        "\n",
        "Yang dipraktikkan:\n",
        "- audit data mentah,\n",
        "- cleansing: duplikasi, nilai negatif, missing value,\n",
        "- preprocessing: standardisasi fitur,\n",
        "- EDA: ringkasan statistik dan plot SVG,\n",
        "- regresi linear sederhana untuk membaca tren dan residual,\n",
        "- k-means dari nol,\n",
        "- inertia, PCA 2D mini, cosine similarity,\n",
        "- anomaly detection dengan z-score.\n",
        "\"\"\"\n",
        "\n",
        "from __future__ import annotations\n",
        "\n",
        "import math\n",
        "import random\n",
        "from dataclasses import dataclass\n",
        "from pathlib import Path\n",
        "from typing import Iterable, List, Optional, Sequence, Tuple\n",
        "\n",
        "\n",
        "@dataclass(frozen=True)\n",
        "class RawCustomer:\n",
        "    name: str\n",
        "    visits: Optional[float]\n",
        "    spend: Optional[float]\n",
        "    coffee_milk_percent: Optional[float]\n",
        "\n",
        "\n",
        "@dataclass(frozen=True)\n",
        "class Customer:\n",
        "    name: str\n",
        "    visits: float\n",
        "    spend: float\n",
        "    coffee_milk_percent: float\n",
        "\n",
        "\n",
        "RAW_CUSTOMERS = [\n",
        "    RawCustomer(\"Ayu\", 10, 50, 80),\n",
        "    RawCustomer(\"Bima\", 11, 52, 75),\n",
        "    RawCustomer(\"Citra\", 2, 15, 10),\n",
        "    RawCustomer(\"Dedi\", -3, 17, 20),  # invalid: kunjungan negatif\n",
        "    RawCustomer(\"Eka\", 9, None, 78),  # missing spend\n",
        "    RawCustomer(\"Fajar\", 4, 18, 25),\n",
        "    RawCustomer(\"Gita\", 1, 90, 5),  # kandidat anomali belanja\n",
        "    RawCustomer(\"Ayu\", 10, 50, 80),  # duplikat identik\n",
        "    RawCustomer(\"Hani\", 12, 58, 82),\n",
        "]\n",
        "\n",
        "\n",
        "def mean(values: Iterable[float]) -> float:\n",
        "    values = list(values)\n",
        "    return sum(values) / len(values)\n",
        "\n",
        "\n",
        "def median(values: Iterable[float]) -> float:\n",
        "    values = sorted(values)\n",
        "    mid = len(values) // 2\n",
        "    if len(values) % 2 == 1:\n",
        "        return values[mid]\n",
        "    return (values[mid - 1] + values[mid]) / 2\n",
        "\n",
        "\n",
        "def population_std(values: Iterable[float]) -> float:\n",
        "    values = list(values)\n",
        "    mu = mean(values)\n",
        "    return math.sqrt(sum((x - mu) ** 2 for x in values) / len(values))\n",
        "\n",
        "\n",
        "def audit_raw_data(rows: Sequence[RawCustomer]) -> dict:\n",
        "    total_cells = len(rows) * 3\n",
        "    missing = sum(\n",
        "        value is None\n",
        "        for row in rows\n",
        "        for value in (row.visits, row.spend, row.coffee_milk_percent)\n",
        "    )\n",
        "    negative_visits = sum((row.visits is not None and row.visits < 0) for row in rows)\n",
        "    duplicate_rows = len(rows) - len(set(rows))\n",
        "    return {\n",
        "        \"rows\": len(rows),\n",
        "        \"missing_cells\": missing,\n",
        "        \"missing_rate\": missing / total_cells,\n",
        "        \"negative_visits\": negative_visits,\n",
        "        \"duplicate_rows\": duplicate_rows,\n",
        "    }\n",
        "\n",
        "\n",
        "def clean_customers(rows: Sequence[RawCustomer]) -> Tuple[List[Customer], List[str]]:\n",
        "    \"\"\"Clean raw rows and return cleaned customers + action log.\"\"\"\n",
        "    log: List[str] = []\n",
        "    seen = set()\n",
        "    deduped: List[RawCustomer] = []\n",
        "    for row in rows:\n",
        "        if row in seen:\n",
        "            log.append(f\"hapus duplikat identik: {row.name}\")\n",
        "            continue\n",
        "        seen.add(row)\n",
        "        deduped.append(row)\n",
        "\n",
        "    valid_spend = [row.spend for row in deduped if row.spend is not None]\n",
        "    spend_fill = median(valid_spend)\n",
        "    log.append(f\"imputasi spend kosong dengan median={spend_fill:.2f}\")\n",
        "\n",
        "    cleaned: List[Customer] = []\n",
        "    for row in deduped:\n",
        "        if row.visits is None or row.coffee_milk_percent is None:\n",
        "            log.append(f\"buang {row.name}: visits/kopi_susu kosong\")\n",
        "            continue\n",
        "        if row.visits < 0:\n",
        "            log.append(f\"buang {row.name}: kunjungan negatif ({row.visits})\")\n",
        "            continue\n",
        "        if not (0 <= row.coffee_milk_percent <= 100):\n",
        "            log.append(f\"buang {row.name}: persen kopi susu di luar 0-100\")\n",
        "            continue\n",
        "        cleaned.append(\n",
        "            Customer(\n",
        "                row.name,\n",
        "                float(row.visits),\n",
        "                float(row.spend if row.spend is not None else spend_fill),\n",
        "                float(row.coffee_milk_percent),\n",
        "            )\n",
        "        )\n",
        "    return cleaned, log\n",
        "\n",
        "\n",
        "def as_matrix(customers: Sequence[Customer]) -> List[List[float]]:\n",
        "    return [[c.visits, c.spend, c.coffee_milk_percent] for c in customers]\n",
        "\n",
        "\n",
        "def transpose(matrix: Sequence[Sequence[float]]) -> List[List[float]]:\n",
        "    return [list(col) for col in zip(*matrix)]\n",
        "\n",
        "\n",
        "def standardize(matrix: Sequence[Sequence[float]]) -> Tuple[List[List[float]], List[float], List[float]]:\n",
        "    columns = transpose(matrix)\n",
        "    mus = [mean(col) for col in columns]\n",
        "    sigmas = [population_std(col) or 1.0 for col in columns]\n",
        "    scaled = []\n",
        "    for row in matrix:\n",
        "        scaled.append([(x - mu) / sigma for x, mu, sigma in zip(row, mus, sigmas)])\n",
        "    return scaled, mus, sigmas\n",
        "\n",
        "\n",
        "def euclidean(a: Sequence[float], b: Sequence[float]) -> float:\n",
        "    return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))\n",
        "\n",
        "\n",
        "def cosine_similarity(a: Sequence[float], b: Sequence[float]) -> float:\n",
        "    dot = sum(x * y for x, y in zip(a, b))\n",
        "    norm_a = math.sqrt(sum(x * x for x in a))\n",
        "    norm_b = math.sqrt(sum(y * y for y in b))\n",
        "    if norm_a == 0 or norm_b == 0:\n",
        "        return 0.0\n",
        "    return dot / (norm_a * norm_b)\n",
        "\n",
        "\n",
        "def centroid(rows: Sequence[Sequence[float]]) -> List[float]:\n",
        "    return [mean(col) for col in transpose(rows)]\n",
        "\n",
        "\n",
        "def kmeans(matrix: Sequence[Sequence[float]], k: int = 2, iterations: int = 12, seed: int = 42):\n",
        "    random.seed(seed)\n",
        "    centers = [list(row) for row in random.sample(list(matrix), k)]\n",
        "    assignments = [0 for _ in matrix]\n",
        "    history = []\n",
        "\n",
        "    for _ in range(iterations):\n",
        "        assignments = []\n",
        "        for row in matrix:\n",
        "            distances = [euclidean(row, center) for center in centers]\n",
        "            assignments.append(min(range(k), key=lambda idx: distances[idx]))\n",
        "\n",
        "        new_centers = []\n",
        "        for cluster_id in range(k):\n",
        "            members = [row for row, assigned in zip(matrix, assignments) if assigned == cluster_id]\n",
        "            new_centers.append(centroid(members) if members else centers[cluster_id])\n",
        "        history.append((assignments[:], [c[:] for c in new_centers]))\n",
        "        if new_centers == centers:\n",
        "            break\n",
        "        centers = new_centers\n",
        "\n",
        "    return assignments, centers, history\n",
        "\n",
        "\n",
        "def inertia(matrix: Sequence[Sequence[float]], assignments: Sequence[int], centers: Sequence[Sequence[float]]) -> float:\n",
        "    return sum(euclidean(row, centers[cluster_id]) ** 2 for row, cluster_id in zip(matrix, assignments))\n",
        "\n",
        "\n",
        "def covariance_2d(points: Sequence[Sequence[float]]) -> Tuple[float, float, float]:\n",
        "    xs = [p[0] for p in points]\n",
        "    ys = [p[1] for p in points]\n",
        "    mux, muy = mean(xs), mean(ys)\n",
        "    var_x = mean((x - mux) ** 2 for x in xs)\n",
        "    var_y = mean((y - muy) ** 2 for y in ys)\n",
        "    cov_xy = mean((x - mux) * (y - muy) for x, y in zip(xs, ys))\n",
        "    return var_x, cov_xy, var_y\n",
        "\n",
        "\n",
        "def first_principal_component_2d(points: Sequence[Sequence[float]]) -> Tuple[List[float], float]:\n",
        "    a, b, d = covariance_2d(points)\n",
        "    trace = a + d\n",
        "    determinant = a * d - b * b\n",
        "    delta = math.sqrt(max(0.0, trace * trace - 4 * determinant))\n",
        "    lambda1 = (trace + delta) / 2\n",
        "\n",
        "    if abs(b) > 1e-12:\n",
        "        vector = [b, lambda1 - a]\n",
        "    elif a >= d:\n",
        "        vector = [1.0, 0.0]\n",
        "    else:\n",
        "        vector = [0.0, 1.0]\n",
        "    norm = math.sqrt(vector[0] ** 2 + vector[1] ** 2) or 1.0\n",
        "    return [vector[0] / norm, vector[1] / norm], lambda1\n",
        "\n",
        "\n",
        "def project_2d(points: Sequence[Sequence[float]], component: Sequence[float]) -> List[float]:\n",
        "    xs = [p[0] for p in points]\n",
        "    ys = [p[1] for p in points]\n",
        "    mux, muy = mean(xs), mean(ys)\n",
        "    return [((x - mux) * component[0] + (y - muy) * component[1]) for x, y in points]\n",
        "\n",
        "\n",
        "def z_scores(values: Sequence[float]) -> List[float]:\n",
        "    mu = mean(values)\n",
        "    sigma = population_std(values) or 1.0\n",
        "    return [(x - mu) / sigma for x in values]\n",
        "\n",
        "\n",
        "def linear_regression(xs: Sequence[float], ys: Sequence[float]) -> Tuple[float, float]:\n",
        "    xbar, ybar = mean(xs), mean(ys)\n",
        "    numerator = sum((x - xbar) * (y - ybar) for x, y in zip(xs, ys))\n",
        "    denominator = sum((x - xbar) ** 2 for x in xs) or 1.0\n",
        "    w = numerator / denominator\n",
        "    b = ybar - w * xbar\n",
        "    return w, b\n",
        "\n",
        "\n",
        "def residuals(xs: Sequence[float], ys: Sequence[float], w: float, b: float) -> List[float]:\n",
        "    return [y - (w * x + b) for x, y in zip(xs, ys)]\n",
        "\n",
        "\n",
        "def scale(value: float, lo: float, hi: float, out_lo: float, out_hi: float) -> float:\n",
        "    if hi == lo:\n",
        "        return (out_lo + out_hi) / 2\n",
        "    return out_lo + (value - lo) * (out_hi - out_lo) / (hi - lo)\n",
        "\n",
        "\n",
        "def write_svg(path: Path, body: str, title: str, subtitle: str) -> None:\n",
        "    path.parent.mkdir(parents=True, exist_ok=True)\n",
        "    path.write_text(\n",
        "        f'''<svg xmlns=\"http://www.w3.org/2000/svg\" width=\"720\" height=\"420\" viewBox=\"0 0 720 420\" role=\"img\" aria-label=\"{title}\">\n",
        "<rect width=\"720\" height=\"420\" fill=\"#f8fafc\"/>\n",
        "<rect x=\"24\" y=\"22\" width=\"672\" height=\"376\" rx=\"22\" fill=\"#ffffff\" stroke=\"#cbd5e1\" stroke-width=\"2\"/>\n",
        "<text x=\"50\" y=\"58\" font-family=\"Arial\" font-size=\"24\" font-weight=\"700\" fill=\"#0f172a\">{title}</text>\n",
        "<text x=\"50\" y=\"84\" font-family=\"Arial\" font-size=\"15\" fill=\"#475569\">{subtitle}</text>\n",
        "{body}\n",
        "</svg>''',\n",
        "        encoding=\"utf-8\",\n",
        "    )\n",
        "\n",
        "\n",
        "def plot_scatter_clusters(customers: Sequence[Customer], assignments: Sequence[int], out: Path) -> None:\n",
        "    xs = [c.visits for c in customers]\n",
        "    ys = [c.spend for c in customers]\n",
        "    colors = [\"#2563eb\", \"#16a34a\", \"#f97316\", \"#9333ea\"]\n",
        "    parts = ['<path d=\"M80 340 L650 340 M80 340 L80 105\" stroke=\"#334155\"/>']\n",
        "    for c, cluster_id in zip(customers, assignments):\n",
        "        x = scale(c.visits, min(xs), max(xs), 95, 640)\n",
        "        y = scale(c.spend, min(ys), max(ys), 330, 115)\n",
        "        parts.append(f'<circle cx=\"{x:.1f}\" cy=\"{y:.1f}\" r=\"9\" fill=\"{colors[cluster_id % len(colors)]}\"/><text x=\"{x+10:.1f}\" y=\"{y-8:.1f}\" font-size=\"12\">{c.name}</text>')\n",
        "    parts.append('<text x=\"310\" y=\"380\" font-size=\"14\">kunjungan</text><text x=\"26\" y=\"230\" font-size=\"14\" transform=\"rotate(-90 26,230)\">belanja</text>')\n",
        "    write_svg(out, \"\\n\".join(parts), \"Scatter Cluster Pelanggan\", \"Setiap titik adalah pelanggan; warna adalah cluster k-means\")\n",
        "\n",
        "\n",
        "def plot_histogram(values: Sequence[float], out: Path) -> None:\n",
        "    bins = 5\n",
        "    lo, hi = min(values), max(values)\n",
        "    width = (hi - lo) / bins or 1\n",
        "    counts = [0] * bins\n",
        "    for v in values:\n",
        "        idx = min(bins - 1, int((v - lo) / width))\n",
        "        counts[idx] += 1\n",
        "    max_count = max(counts) or 1\n",
        "    parts = ['<path d=\"M80 340 L650 340 M80 340 L80 105\" stroke=\"#334155\"/>']\n",
        "    for i, count in enumerate(counts):\n",
        "        h = scale(count, 0, max_count, 0, 210)\n",
        "        x = 110 + i * 95\n",
        "        y = 340 - h\n",
        "        parts.append(f'<rect x=\"{x}\" y=\"{y:.1f}\" width=\"65\" height=\"{h:.1f}\" fill=\"#60a5fa\"/><text x=\"{x+20}\" y=\"{y-8:.1f}\" font-size=\"12\">{count}</text>')\n",
        "    parts.append('<text x=\"300\" y=\"380\" font-size=\"14\">rentang belanja</text>')\n",
        "    write_svg(out, \"\\n\".join(parts), \"Histogram Belanja\", \"Melihat distribusi satu fitur sebelum model\")\n",
        "\n",
        "\n",
        "def plot_regression(customers: Sequence[Customer], w: float, b: float, out: Path) -> None:\n",
        "    xs = [c.visits for c in customers]\n",
        "    ys = [c.spend for c in customers]\n",
        "    x_min, x_max = min(xs), max(xs)\n",
        "    y_values = ys + [w * x_min + b, w * x_max + b]\n",
        "    parts = ['<path d=\"M80 340 L650 340 M80 340 L80 105\" stroke=\"#334155\"/>']\n",
        "    x1 = scale(x_min, x_min, x_max, 95, 640)\n",
        "    y1 = scale(w * x_min + b, min(y_values), max(y_values), 330, 115)\n",
        "    x2 = scale(x_max, x_min, x_max, 95, 640)\n",
        "    y2 = scale(w * x_max + b, min(y_values), max(y_values), 330, 115)\n",
        "    parts.append(f'<path d=\"M{x1:.1f} {y1:.1f} L{x2:.1f} {y2:.1f}\" stroke=\"#ef4444\" stroke-width=\"4\"/>')\n",
        "    for c in customers:\n",
        "        x = scale(c.visits, x_min, x_max, 95, 640)\n",
        "        y = scale(c.spend, min(y_values), max(y_values), 330, 115)\n",
        "        yhat = scale(w * c.visits + b, min(y_values), max(y_values), 330, 115)\n",
        "        parts.append(f'<circle cx=\"{x:.1f}\" cy=\"{y:.1f}\" r=\"8\" fill=\"#2563eb\"/><path d=\"M{x:.1f} {y:.1f} L{x:.1f} {yhat:.1f}\" stroke=\"#f97316\" stroke-dasharray=\"4 4\"/><text x=\"{x+9:.1f}\" y=\"{y-6:.1f}\" font-size=\"12\">{c.name}</text>')\n",
        "    write_svg(out, \"\\n\".join(parts), \"Regresi Linear dan Residual\", \"Garis merah adalah tren; garis putus-putus adalah residual\")\n",
        "\n",
        "\n",
        "def plot_anomaly_z(customers: Sequence[Customer], z_values: Sequence[float], out: Path) -> None:\n",
        "    parts = ['<path d=\"M80 340 L650 340 M80 340 L80 105\" stroke=\"#334155\"/>']\n",
        "    for i, (c, z) in enumerate(zip(customers, z_values)):\n",
        "        x = 110 + i * 70\n",
        "        y = scale(z, min(z_values), max(z_values), 330, 120)\n",
        "        color = \"#ef4444\" if abs(z) >= 1.5 else \"#10b981\"\n",
        "        parts.append(f'<circle cx=\"{x}\" cy=\"{y:.1f}\" r=\"9\" fill=\"{color}\"/><text x=\"{x-14}\" y=\"360\" font-size=\"12\">{c.name}</text><text x=\"{x-12}\" y=\"{y-12:.1f}\" font-size=\"12\">{z:.1f}</text>')\n",
        "    parts.append('<text x=\"92\" y=\"113\" font-size=\"13\" fill=\"#ef4444\">|z| besar = perlu diperiksa</text>')\n",
        "    write_svg(out, \"\\n\".join(parts), \"Z-score Anomali Belanja\", \"Titik merah bukan vonis; hanya sinyal investigasi\")\n",
        "\n",
        "\n",
        "def print_table(customers: Sequence[Customer], assignments: Sequence[int], z_spend: Sequence[float], res: Sequence[float]) -> None:\n",
        "    print(\"Nama     kunjungan belanja kopi_susu% cluster z_belanja residual\")\n",
        "    print(\"-\" * 74)\n",
        "    for customer, cluster_id, z, e in zip(customers, assignments, z_spend, res):\n",
        "        print(\n",
        "            f\"{customer.name:<8} {customer.visits:>8.0f} {customer.spend:>7.0f}\"\n",
        "            f\" {customer.coffee_milk_percent:>10.0f} {cluster_id:>7} {z:>9.2f} {e:>8.2f}\"\n",
        "        )\n",
        "\n",
        "\n",
        "def main() -> None:\n",
        "    print(\"=== Bab 8: Unsupervised + Data Preparation Playground ===\")\n",
        "    audit = audit_raw_data(RAW_CUSTOMERS)\n",
        "    print(\"\\nAudit data mentah:\", audit)\n",
        "\n",
        "    customers, cleaning_log = clean_customers(RAW_CUSTOMERS)\n",
        "    print(\"\\nLog cleansing:\")\n",
        "    for item in cleaning_log:\n",
        "        print(\"-\", item)\n",
        "\n",
        "    matrix = as_matrix(customers)\n",
        "    scaled, mus, sigmas = standardize(matrix)\n",
        "    print(\"\\nFitur: visits, spend, coffee_milk_percent\")\n",
        "    print(\"Mean fitur:\", [round(x, 2) for x in mus])\n",
        "    print(\"Std fitur:\", [round(x, 2) for x in sigmas])\n",
        "\n",
        "    dist_ab = euclidean([customers[0].visits, customers[0].spend], [customers[1].visits, customers[1].spend])\n",
        "    dist_ac = euclidean([customers[0].visits, customers[0].spend], [customers[2].visits, customers[2].spend])\n",
        "    print(\"\\nJarak mentah Ayu-Bima:\", round(dist_ab, 2))\n",
        "    print(\"Jarak mentah Ayu-Citra:\", round(dist_ac, 2))\n",
        "\n",
        "    visits = [c.visits for c in customers]\n",
        "    spend = [c.spend for c in customers]\n",
        "    w, b = linear_regression(visits, spend)\n",
        "    res = residuals(visits, spend, w, b)\n",
        "    print(\"\\nRegresi linear insight: spend_hat = w*visits + b\")\n",
        "    print(\"w:\", round(w, 3), \"b:\", round(b, 3))\n",
        "    print(\"Residual:\", [round(x, 2) for x in res])\n",
        "\n",
        "    assignments, centers, history = kmeans(scaled, k=2, iterations=12, seed=7)\n",
        "    score = inertia(scaled, assignments, centers)\n",
        "    spend_z = z_scores(spend)\n",
        "    print(\"\\nK-means assignments:\", assignments)\n",
        "    print(\"Inertia:\", round(score, 3))\n",
        "    print(\"\\nTabel hasil:\")\n",
        "    print_table(customers, assignments, spend_z, res)\n",
        "\n",
        "    # PCA mini memakai dua fitur pertama setelah scaling: visits dan spend.\n",
        "    points_2d = [[row[0], row[1]] for row in scaled]\n",
        "    pc1, eig = first_principal_component_2d(points_2d)\n",
        "    projections = project_2d(points_2d, pc1)\n",
        "    print(\"\\nPCA 2D mini\")\n",
        "    print(\"Komponen utama pertama:\", [round(x, 3) for x in pc1])\n",
        "    print(\"Eigenvalue:\", round(eig, 3))\n",
        "    print(\"Proyeksi:\", [round(x, 3) for x in projections])\n",
        "\n",
        "    print(\"\\nCosine similarity [1,1] vs [2,2]:\", round(cosine_similarity([1, 1], [2, 2]), 3))\n",
        "    print(\"Cosine similarity [1,0] vs [0,1]:\", round(cosine_similarity([1, 0], [0, 1]), 3))\n",
        "\n",
        "    script_dir = Path(__file__).resolve().parent if \"__file__\" in globals() else Path.cwd()\n",
        "    output_dir = script_dir / \"outputs\"\n",
        "    plot_scatter_clusters(customers, assignments, output_dir / \"scatter_clusters.svg\")\n",
        "    plot_histogram(spend, output_dir / \"histogram_spend.svg\")\n",
        "    plot_regression(customers, w, b, output_dir / \"linear_regression_residuals.svg\")\n",
        "    plot_anomaly_z(customers, spend_z, output_dir / \"anomaly_zscore.svg\")\n",
        "    print(\"\\nPlot SVG dibuat di:\", output_dir)\n",
        "\n",
        "    print(\"\\nInterpretasi aman:\")\n",
        "    print(\"- Cleansing adalah keputusan analitis; catat lognya.\")\n",
        "    print(\"- Cluster adalah alat eksplorasi, bukan label kebenaran.\")\n",
        "    print(\"- Residual besar dan |z| besar adalah sinyal investigasi, bukan vonis.\")\n",
        "    print(\"- Nama cluster harus netral, misalnya 'rutin bernilai sedang'.\")\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Jalankan pipeline utama\n",
        "Pipeline akan membuat file SVG di folder `outputs/`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "main()\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Latihan hitung cepat\n",
        "Bandingkan hasil kode dengan hitungan manual di draft.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"Jarak [2,4] ke [10,12] =\", round(euclidean([2,4], [10,12]), 3))\n",
        "print(\"Cosine [1,0] vs [0,1] =\", round(cosine_similarity([1,0], [0,1]), 3))\n",
        "print(\"Regresi [1,2,3] -> [2,4,6] =\", linear_regression([1,2,3], [2,4,6]))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. Eksperimen mandiri\n",
        "Tambahkan baris mentah baru, ubah nilai missing/outlier, lalu jalankan ulang `clean_customers`, `kmeans`, dan plot SVG.\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}