📚 Model-based offline RL methods are a good fit for offline RL as they allow training a model on available data and using it to obtain a good policy or plan directly.
❓ In model-based RL, the trained model is used to answer "what if" questions about different states and actions.
⚙️ Dyna-style methods are adapted to the offline setting to simulate rollouts starting from the collected states and actions.
⛔ One challenge in offline RL is the policy learning to exploit the model by tricking it into going into high-reward out-of-distribution states.
🔧 Modifying model-based methods to penalize the policy when it tricks the model into crazy states can incentivize the policy to stay closer to the data.
🔑 Mobile model-based offline policy optimization modifies the reward function to impose a penalty for exploiting the model.
💡 The uncertainty penalty quantifies how wrong the model is and punishes the policy enough to discourage exploitation.
⚙️ Using model uncertainty techniques, such as training an ensemble of models, helps measure the degree of disagreement among models.
Ensemble disagreement is a common choice for obtaining error metrics in offline reinforcement learning.
Two assumptions are required for accurate estimation of the model error and value function.
The learned policy in offline reinforcement learning can be guaranteed to perform at least as well as the best policy optimized against a reward-minus-error objective.
The best policy is one that avoids states where the model may be incorrect.
The learned policy is at least as good as the behavior policy, considering the model's error.
If the model accurately represents the optimal policy, the learned policy can be close to optimal.
🔍 Using data from the model, the critic's loss function in offline reinforcement learning is designed to balance the q values of the model and the data set.
🎲 Dyna-style algorithms such as CQL and MORAL aim to improve offline reinforcement learning by making the model-based states and actions look worse than the data-based ones.
📊 The trajectory transformer method in offline reinforcement learning trains a model over entire trajectories to estimate the distribution of state-action sequences and optimizes planning based on high-probability actions.
🔑 Using a large and expressive model class, like a transformer, is convenient for offline reinforcement learning.
🔄 To model multi-modal distributions, the trajectory is discretized per dimension of every state and action.
⏲️ By modeling state and action probabilities, accurate predictions can be made for longer horizons.
Using trajectory transformer to make predictions for humanoid future steps.
Utilizing beam search to maximize reward in planning.
Generating high probability trajectories to avoid out-of-distribution states and actions.
Expert Secrets (Russell Brunson) Summary - 5 Most Impactful Lessons
《初級》你適合主觀交易還是程式交易?今天就讓我們來聊聊聊這個主題,透過我自己的經驗來跟你分享我的一些看法以及建議 [走進我的交易廚房/交易小貼士/你適不適合當交易員?]
COMO HACER UN DELANTAL FACIL Y PASO A PASO | Yuyis Creations
Jeff Bezos Talks Business Vision, Leadership & Entrepreneurship
#20 Python Tutorial for Beginners | While Loop in Python
《初級》世界期貨/外匯交易錦標賽參賽者Marek Chrastina訪談(完)/本影片為學員講座節錄版本,學員可以直接透過授權申請觀看完整講座內容~