The idea is simple: if an NLP model is designed to converse with humans then what better way to see how well it performs than by talking to it? Dubbed the Dynabench (as in ‚Äúdynamic benchmarking‚ÄĚ), this system relies on people to ask a series of NLP algorithms probing and linguistically challenging questions in an effort to trip them up. The less the algorithm can be fooled, the better it is at doing its job.¬†What‚Äôs more, this dynamic benchmarking system is largely unaffected by the issues that plague static benchmarks. ‚ÄúThe process cannot saturate, it will be less prone to bias and artifacts, and it allows us to measure performance in ways that are closer to the real-world applications we care most about,‚ÄĚ FAIR researcher Douwe Kiela wrote in the post.‚ÄúThe nice thing about Dynabench is that if a bias exists in previous rounds and people find a way to exploit these models‚Ä¶‚ÄĚ Kiela told Engadget, ‚Äúwe collect a lot of examples that can be used to train the model so that it doesn’t make that mistake anymore.‚ÄĚWhat‚Äôs really cool is that anyone can give Dynabench a try, it‚Äôs open to the public. Users simply have to log into the Dynabench portal to start chatting (via text of course) with a group of NLP models, there‚Äôs no experience required outside of a basic grasp on the English language. Moving forward, Kiela and his team hope to expand the system‚Äôs capabilities with more models, more modalities, and additional languages.
All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.