# Cheatsheet 2

Oke, ini **cheat-sheet Python (pandas)** buat **subset, grouping, dan filtering** — dari **paling dasar sampai yang sering dipakai di data real**. Contohnya aku sesuaikan dengan data kamu (provinsi, kab, gender, dll).

---

# 1️⃣ SUBSET (ambil sebagian data)

## a) Subset kolom

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5B%5B%22provinsi%22%2C-%22kab"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[[<span class="hljs-string">"provinsi"</span>, <span class="hljs-string">"kab"</span>, <span class="hljs-string">"gender_ketua_kelompok"</span>]]`</div></div>## b) Subset baris (index-based)

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.iloc%5B%3A10%5D-%23-10-ba"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.iloc[:<span class="hljs-number">10</span>]        <span class="hljs-comment"># 10 baris pertama</span>df.iloc[<span class="hljs-number">5</span>:<span class="hljs-number">20</span>]       <span class="hljs-comment"># slice</span>`</div></div><div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.loc%5B100%3A200%5D-%23-be"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.loc[<span class="hljs-number">100</span>:<span class="hljs-number">200</span>]     <span class="hljs-comment"># berdasarkan index label</span>`</div></div>## c) Subset baris + kondisi

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22provinsi%22%5D-%3D%3D"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"provinsi"</span>] == <span class="hljs-string">"JAWA BARAT"</span>]`</div></div>---

# 2️⃣ FILTER (kondisi logika)

## a) Filter satu kondisi

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22gender_ketua_"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"gender_ketua_kelompok"</span>] == <span class="hljs-string">"L"</span>]`</div></div>## b) Multiple kondisi (`&`, `|`)

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5B-%28df%5B%22provinsi%22%5D-"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[    (df[<span class="hljs-string">"provinsi"</span>] == <span class="hljs-string">"JAWA BARAT"</span>) &    (df[<span class="hljs-string">"gender_ketua_kelompok"</span>] == <span class="hljs-string">"P"</span>)]`</div></div>⚠️ WAJIB pakai tanda kurung!

---

## c) Filter dengan `isin`

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22provinsi%22%5D.is"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"provinsi"</span>].isin([<span class="hljs-string">"JAWA BARAT"</span>, <span class="hljs-string">"JAWA TENGAH"</span>])]`</div></div>## d) Filter string (`contains`, `startswith`)

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22nama_kelompok"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"nama_kelompok"</span>].<span class="hljs-built_in">str</span>.contains(<span class="hljs-string">"NELAYAN"</span>, <span class="hljs-keyword">case</span>=<span class="hljs-literal">False</span>, na=<span class="hljs-literal">False</span>)]`</div></div>---

## e) Filter angka

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22skor_kelompok"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"skor_kelompok"</span>] >= <span class="hljs-number">80</span>]`</div></div>---

# 3️⃣ GROUPING (ringkas data)

## a) Group by satu kolom

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.groupby%28%22provinsi"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.groupby(<span class="hljs-string">"provinsi"</span>).size()`</div></div>Sama dengan:

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5B%22provinsi%22%5D.value"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[<span class="hljs-string">"provinsi"</span>].value_counts()`</div></div>---

## b) Group + agregasi

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.groupby%28%22provinsi-1"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.groupby(<span class="hljs-string">"provinsi"</span>)[<span class="hljs-string">"nama_kelompok"</span>].nunique()`</div></div>---

## c) Group by multiple kolom

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.groupby%28%5B%22provins"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.groupby([<span class="hljs-string">"provinsi"</span>, <span class="hljs-string">"gender_ketua_kelompok"</span>]).size()`</div></div>---

## d) Banyak agregasi sekaligus

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.groupby%28%22provinsi-2"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.groupby(<span class="hljs-string">"provinsi"</span>).agg(    total_kelompok=(<span class="hljs-string">"nama_kelompok"</span>, <span class="hljs-string">"count"</span>),    unik_kelompok=(<span class="hljs-string">"nama_kelompok"</span>, <span class="hljs-string">"nunique"</span>),    rata_skor=(<span class="hljs-string">"skor_kelompok"</span>, <span class="hljs-string">"mean"</span>),)`</div></div>---

# 4️⃣ SUBSET + GROUP + FILTER (REAL CASE)

### ❓ Jumlah kelompok per provinsi, hanya yang &gt; 1000

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-%28-df.groupby%28%22provin"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`(    df.groupby(<span class="hljs-string">"provinsi"</span>)      .size()      .reset_index(name=<span class="hljs-string">"jumlah"</span>)      .query(<span class="hljs-string">"jumlah > 1000"</span>)      .sort_values(<span class="hljs-string">"jumlah"</span>, ascending=<span class="hljs-literal">False</span>))`</div></div>---

### ❓ Distribusi gender per provinsi

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-%28-df.groupby%28%5B%22provi"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`(    df.groupby([<span class="hljs-string">"provinsi"</span>, <span class="hljs-string">"gender_ketua_kelompok"</span>])      .size()      .unstack(fill_value=<span class="hljs-number">0</span>))`</div></div>---

### ❓ Kelompok perempuan di Jawa Barat

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5B-%28df%5B%22provinsi%22%5D--1"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[    (df[<span class="hljs-string">"provinsi"</span>] == <span class="hljs-string">"JAWA BARAT"</span>) &    (df[<span class="hljs-string">"gender_ketua_kelompok"</span>] == <span class="hljs-string">"P"</span>)][[<span class="hljs-string">"nama_kelompok"</span>, <span class="hljs-string">"kab"</span>, <span class="hljs-string">"desa_kelompok"</span>]]`</div></div>---

# 5️⃣ FILTER DENGAN `query()` (lebih readable)

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.query%28%22provinsi-%3D"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.query(<span class="hljs-string">"provinsi == 'JAWA BARAT' and gender_ketua_kelompok == 'P'"</span>)`</div></div>Dengan `in`:

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.query%28%22provinsi-i"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.query(<span class="hljs-string">"provinsi in ['JAWA BARAT','JAWA TENGAH']"</span>)`</div></div>---

# 6️⃣ GROUP + FILTER (HAVING-style)

Contoh: **provinsi dengan ≥ 100 kelompok perempuan**

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-g-%3D-df%5Bdf%5B%22gender_ke"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`g = df[df[<span class="hljs-string">"gender_ketua_kelompok"</span>] == <span class="hljs-string">"P"</span>](    g.groupby(<span class="hljs-string">"provinsi"</span>)     .size()     .loc[<span class="hljs-keyword">lambda</span> s: s >= <span class="hljs-number">100</span>])`</div></div>---

# 7️⃣ Best practice (penting banget)

### ✔️ Jangan lupa normalisasi dulu

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5B%22provinsi%22%5D-%3D-df%5B"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[<span class="hljs-string">"provinsi"</span>] = df[<span class="hljs-string">"provinsi"</span>].<span class="hljs-built_in">str</span>.strip().<span class="hljs-built_in">str</span>.upper()df[<span class="hljs-string">"gender_ketua_kelompok"</span>] = df[<span class="hljs-string">"gender_ketua_kelompok"</span>].<span class="hljs-built_in">str</span>.strip().<span class="hljs-built_in">str</span>.upper()`</div></div>### ✔️ Hindari chained indexing

❌

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df%5Bdf%5B%22provinsi%22%5D-%3D%3D-1"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df[df[<span class="hljs-string">"provinsi"</span>] == <span class="hljs-string">"JAWA BARAT"</span>][<span class="hljs-string">"kab"</span>]`</div></div>✔️

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.loc%5Bdf%5B%22provinsi%22"><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.loc[df[<span class="hljs-string">"provinsi"</span>] == <span class="hljs-string">"JAWA BARAT"</span>, <span class="hljs-string">"kab"</span>]`</div></div>---

## TL;DR (pola paling sering)

<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary" id="bkmrk-df.loc%5B-df%5B%22kolom%22%5D."><div class="sticky top-[calc(--spacing(9)+var(--header-height))] @w-xl/main:top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs">  
</div></div></div><div class="overflow-y-auto p-4" dir="ltr">`df.loc[ df[<span class="hljs-string">"kolom"</span>].isin([...]), [<span class="hljs-string">"kolom1"</span>,<span class="hljs-string">"kolom2"</span>] ]df.groupby([...]).agg(...)df.query(<span class="hljs-string">"kondisi"</span>)`</div></div>