Notes
[1] Abramson, J., Ahuja, A., Barr, I., Brussee, A., Carnevale, F., Cassin, M., Chhaparia, R., Clark, S., Damoc, B., Dudzik, A. and Georgiev, P., 2020. Imitating interactive intelligence. arXiv preprint arXiv:2012.05672.
[2] Abramson, J., Ahuja, A., Brussee, A., Carnevale, F., Cassin, M., Fischer, F., Georgiev, P., Goldin, A., Harley, T. and Hill, F., 2021. Creating multimodal interactive brokers with imitation and self-supervised studying. arXiv preprint arXiv:2112.03763.
[3] Abramson, J., Ahuja, A., Carnevale, F., Georgiev, P., Goldin, A., Hung, A., Landon, J., Lillicrap, T., Muldal, A., Richards, B. and Santoro, A., 2022. Evaluating Multimodal Interactive Brokers. arXiv preprint arXiv:2205.13274.
[4] Bai, Y., Jones, A., Ndousse, Okay., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T. and Joseph, N., 2022. Coaching a Useful and Innocent Assistant with Reinforcement Studying from Human Suggestions. arXiv preprint arXiv:2204.05862.
[5] Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S. and Amodei, D., 2017. Deep reinforcement studying from human preferences. Advances in neural info processing programs, 30.