The Explore vs. Exploit Dilemma

How to make better decisions when you have incomplete information.

Apr 06, 2025

Coding For Kids Sophomores

As a sophomore in college, I took an introductory coding class where we used Scratch, a programming language designed for kids.

The class wasn’t just about syntax or building apps—it was about the ideas behind programming. The logic. The trade-offs. The strategy.

One concept stuck with me: the Explore vs. Exploit dilemma. It's a key principle in reinforcement learning. But I didn’t just see it as a machine learning idea.

I saw it everywhere in life.

The Explore vs. Exploit Dilemma

In reinforcement learning, this dilemma is about choosing between:

Exploitation: Picking the best option based on what you currently know.
Exploration: Trying something new, hoping to find something better—even if it comes with short-term risk.

The goal? Maximize long-term reward. Easy to say. Hard to execute.

Life Is One Giant Explore vs. Exploit Problem

Once I learned this concept, I started seeing it in every decision.

Career: Do you double down on your current role—or explore a new path?
Dating: Do you commit—or keep meeting people?
Sales: Stick with the channel that works—or try a new strategy?

In the early stages of a career, you explore. You try industries, roles, environments. Eventually, you find something that works. It fits. It pays the bills. You’re good at it. So you exploit.

The hard part is knowing when to switch between the two. Or when to do a bit of both.

The Multi Armed Bandit Problem

In computer science, there’s a classic analogy to represent this dilemma: the multi-armed bandit problem.

Imagine walking into a casino. There are three slot machines. You want to win the most money over time.

You try machine A. It gives you $5. Then B: $8. You go back to B, and it gives you $3. C gives you $7. So… now what? Keep going with C? Back to A? Try B again?1

You don’t know the true averages. You’re working with incomplete, evolving data. Sound familiar?

Epsilon Greedy Strategy (And How We Already Use It)

Computer scientists developed something called the epsilon greedy method. It works like this:

Most of the time (say, 90%), you go with the best-known option.
The rest of the time (10%), you explore—just in case there’s something better.

You probably use this in real life without realizing it. You mostly stick with what’s working—but sometimes you shake things up. A new route to work. A new hobby. A new book.

We’re all running our own epsilon greedy algorithms.

But Algorithms Have Limits

But here’s the thing about any framework—even a smart one like epsilon greedy:

It works great most of the time.
It keeps you moving forward, minimizes regret, and helps you make solid decisions under uncertainty.

But some decisions? They’re bigger than algorithms.

Where Epsilon Greedy Fails

Even the smartest strategies fall short.

Let’s say you're choosing between two paths:

→ 7 → 12 → 6
→ 7 → 3 → 99

The greedy algorithm chooses 12 (because it looks best now) and misses the 99 hidden down the other path. It doesn’t explore far enough.

We do this too. We follow what’s working right now, and never find out what could’ve been better if we’d explored a little longer.

**A suboptimal algorithm doesn’t explore far enough. So it misses the most rewarding path.**

My Story

When I was 22, I was seriously discerning the Catholic priesthood. I had a room reserved at the Borromeo House in Austin—a place for young men preparing for seminary.

But around that time, I reconnected with someone I’d dated a few years before. We started talking again. That same day, I scribbled this into the margin of a book:

“Any situation that requires a moral decision cannot be planned in a hypothetical scenario. Prudence of the individual person in that particular situation is the only scenario that has validity and is grounded in truth.

You cannot apply a formula or moral equation to the messiness of life. Rather, you must make the appropriate reply to your current case. Your path in life may not fully make sense right now—but looking back years from now, it will.”

I canceled the plan. I went home. We started dating.

This past July, we got married.

That 3 → 99 path? That was her. Looking back, it all makes sense why it unfolded that way.

What’s the Takeaway?

Finding the right balance between exploring and exploiting works most of the time.

The epsilon greedy method—90% exploitation, 10% exploration—is a solid life strategy. Do what works. But leave room for curiosity.

But if you feel a particular pull to fully commit to something during your 10% exploration—something that isn’t directly on the path you were on, but feels right?

Pray on it.

And if you get the green light from God, full send.

Get In The Arena Anyway

Even if things don’t work out, it’s better to be the person who tried—who explored—than the one who played it safe and never found out.

In the words of Theodore Roosevelt:

“If he fails, at least fails while daring greatly,
so that his place shall never be with those cold and timid souls
who neither know victory nor defeat.”

You don’t have to always choose the perfect path.

You just have to choose the one that requires courage.

— Grant Varner

If you're curious about the Multi-Armed Bandit problem or want a simple breakdown of how epsilon greedy works, this 5-minute explainer by ritvikmath on YouTube helped me brush up while writing this.

The Explore vs. Exploit Dilemma

How to make better decisions when you have incomplete information.

Coding For Kids Sophomores

The Explore vs. Exploit Dilemma

Life Is One Giant Explore vs. Exploit Problem

The Multi Armed Bandit Problem

Epsilon Greedy Strategy (And How We Already Use It)

But Algorithms Have Limits

Where Epsilon Greedy Fails

My Story

What’s the Takeaway?

Get In The Arena Anyway

Discussion about this post