Reinforcement Learning from Human Feedback Basics

Chapter Contents

Introduction

This is the first paragraph of the introduction chapter. This is a test of citing [1].

First: Images

This is the first subsection. Please, admire the gloriousnes of this seagull:

A cool seagull.

A bigger seagull:

A cool big seagull.

Second: Tables

This is the second subsection.

Please, check First: Images subsection.

Please, check this subsection.

This is an example table.
Index Name
0 AAA
1 BBB

Third: Equations

Formula example: \(\mu = \sum_{i=0}^{N} \frac{x_i}{N}\)

Now, full size:

\[\mu = \sum_{i=0}^{N} \frac{x_i}{N}\]

And a code sample:

def hello_world
  puts "hello world!"
end

hello_world

Check these unicode characters: ǽߢð€đŋμ

Fourth: Cross references

These cross references are disabled by default. To enable them, check the Cross references section on the README.md file.

Here’s a list of cross references:

Figure 1: A cool seagull

\[ y = mx + b \qquad{(1)}\]

Table 1: This is an example table.
Index Name
0 AAA
1 BBB

Bibliography

[1]
N. Lambert, T. K. Gilbert, and T. Zick, “Entangled preferences: The history and risks of reinforcement learning and human feedback,” arXiv preprint arXiv:2310.13595, 2023.