Exercise: RPN Calculator

High-Tech Institute // C++ fundamentals // Module xx (extra exercises)

Kris van Rens

RPN calculator intro

Goal: Create an RPN calculator

In this exercise, you’ll implement a simple calculator using Reverse Polish Notation (RPN).

You will likely be most familiar to algebraic notation, or infix notation, so first read about RPN first to get an idea of what it is and why it’s easy to implement an interpreter for it. It will take a bit of getting used to RPN before really appreciating it.

Algebraic notation

\(5 + 2\)

\(7 * 21 - 9\)

\((220 / 7 + 10) * 4\)

RPN notation

\(5 \space 2 \space +\)

\(7 \space 21 \space * \space 9 \space -\)

\(220 \space 7 \space / \space 1 \space + \space 4 \space *\)

The “trick” behind RPN calculators: a stack

Each operand is pushed onto the stack, each operator will pop the operands it needs off the stack and pushes its result back onto the stack. It never needs more than two memory cells.

We’ll keep it simple

For the sake of keeping the end result under ±300 lines of C++ code, we’ll keep it simple:

Support for base 10 signed integer math only (at first),
Support for the basic math operations: + / - / * / / / %

There’s a suite of example tests in the solution code that can be used to validate your program.

The solution code is split up in a single version for each step, but generally speaking each solution comprises of a single .cpp file. Feel free to change this to your liking, or divert from the exercise in any other way!

The goal is to have fun and learn something!

Feel free to extend your version at will, my solution code is just one of many possibilities.

Exercise steps

Version 00: Application/build setup

As a first basic step, implement a program to test your build setup.

It doesn’t have to be functional otherwise.

Use an IDE to create a C++ project, build it manually or use a build system like Cmake.

For example: manual build, e.g. using GCC:

$ g++ -Wall -Wextra -Wconversion -Werror -O3 -std=c++20 -o rpn-calculator rpn-calculator.cpp

Feel free to steal the Cmake configuration from the solution code.

Version 01: Reading user input

Add the code to read a string of user input from the command-line terminal.

In the best case, your program can accept any data from “standard input”.

This can be used for testing:

Using echo: echo '5 2 *' | ./rpn-calculator
Bash “here strings”: ./rpn-calculator <<< '5 2 *'

Version 02: Input classification

Add the code to categorize the input string. Classify the code input data as either:

An operand (i.e. signed integer numbers),
An operator (i.e. one of the supported operation symbols),
Invalid data.

Version 03: Input tokenization

The next step in input classification is tokenization: the process of wrapping the classified data into distinct C++ types. With this step we basically “plug in” to the C++ type system and benefit from it.

When modeling the data type(s) for a token, consider the number of states it can have and how many are valid. Try to reach an optimal mapping by minimizing the amount of possible invalid states.

Version 04: Token parsing

Add the code to parse the string data from the token into value that can be used in calculations.

Note: prefer use of C++17’s std::from_chars over the C-style atoi/atol/… functions.

Version 05: Calculation support

Add the code to perform a single calculation using the parsed and tokenized user input.

Hint: a switch statement can be used over integral and enumerator types, or any type contextually implicitly convertible to an integral or enumerator type.

Version 06: Using a stack for calculation memory

Add the code for proper stack-based memory to store operands and results.

Note: take a look at available container types in the C++ standard library.

Version 07: Main state machine setup

In order to accept each operand separately, or accept multiple calculation requests in interactive mode, we need some kind of state machine. Set up the state machine “happy path” first.

Version 08: Filling in the state machine flow

Add the code to deal with each erroneous input in a graceful way.

Version 09: Better error handling

You may have noticed, broadly speaking, we have two classes of errors:

Fatal errors resulting from memory issues or other internal exceptions (log + stop the program),
Calculation errors resulting from user input mistakes (log + reset state machine).

Add the code to deal with the two classes of errors separately and correctly.

Version 10: A real discriminant union

Let’s get back to the earlier observation that the struct Token containing an enum class as a discriminant and a std::string value as data field is not ideal. For example, suppose the input token is Invalid, we can still call parse which then fails. This data structure is just a bad mapping on the problem space.

We need a “true enumerator”, or “discriminant union”. In C++ we can use std::variant for that.

Each token type will be represented by its own C++ type. This hugely improves the data structure mapping.

Rewrite the Token type to be a std::variant. Each state can keep its own state-specific data, and can have its own state-specific behavior.

Hints for convenient use of `std::variant`

In std::variant, the states are “visited” using std::visit. This is a bit of a clunky method.

Use the following setup to visit states using lambdas:

namespace States
{
struct A {};
struct B {};
struct C {};
}

using State = std::variant<States::A, States::B, States::C};

template<typename... Ts>
struct overload : Ts... {
  using Ts::operator()...;
};

State s = States::A{};

std::visit(overload{
  [](States::A) {
    // ..do state 'A' stuff..
  },
  [](States::B) {
    // ..do state 'B' stuff..
  },
  [](States::C) {
    // ..do state 'C' stuff..
  },
}, s);

It uses the overload class template to choose the right lambda function call.

Version 11: Support for longer calculations

In order to support longer calculations, like \(7 \space 21 \space * \space 9 \space -\), we must adapt the state machine flow.

The result of each calculation is stored on the stack, as if it were the first operand. We need an alternative way to detect the calculation end, which will influence the behavior of the calculator, especially the interactive mode.

Version 12: A better approach to state machines

Similar to how we changed the Token type into a std::variant, we can change the States enumerator.

When adding state-specific data or behavior later, this can be done cleanly.

Version 13: Improving robustness

Find situations where calculations fail, and add code to remediate / gracefully fail.

For example, what happens if we enter:

\(1 \space 0 \space /\)
\(999999999999999999999999999 \space 1 \space +\)
\(999999999999999999999 \space 999999999999999999999 \space *\)

Version 14: Making the calculation generic

For future improvements / enhancements, we’d like to make the calculate function type-independent.

Add the code and checks to make calculate generic over data types.

Version 15: Making parsing generic

Similar to calculate, we want to make Token::Operand::parse type-independent.

Add the code and potential checks to make Token::Operand::parse generic over data types.

Version 16: Using a custom stack implementation

Up until now we’ve used the std::stack type from the C++ standard library. In the C++ course, we wrote our own Stack class template type, and even created a partial specialization on a maximum size. This custom stack may well be much more resource-efficient than std::stack.

Add the custom Stack implementation specialized for a maximum stack size of 2.

Version 17: Adding unit tests

All the functions in our code base until now should be unit-tested. This can be done using a test framework like Doctest, Google Test or Catch2. Unfortunately, C++ itself does not offer much facilities for this

Add code to unit-test the functions (calculate, Token::Operand::parse, read_token) in the program.

Version 18: Support for floating-point calculations

Now that Token::Operand::parse and calculate are type-independent, we can switch our base data type.

Switch the base data type from long to float.

What happens to the parsing code? Is the rest of the program still valid?

End of module

High Tech Institute // C++ fundamentals // Module xx (extra exercises)

All emoji in this presentation are part of the Twemoji set, licensed under CC-BY 4.0. All other images are mine, unless specified otherwise.