Amazon wants users to evaluate AI models better and encourage more humans to be involved in the process.
During the AWS re: Invent conference, AWS vice president of database, analytics, and machine learning Swami Sivasubramanian announced Model Evaluation on Bedrock, now available on preview, for models found in its repository Amazon Bedrock. Without a way to transparently test models, developers may end up using ones that are not accurate enough for a question-and-answer project or one that is too large for their use case.
“Model selection and evaluation is not just done at the beginning, but is something that’s repeated periodically,” Sivasubramanian said. “We think having a human in the loop is important, so we are offering a way to manage human evaluation workflows and metrics of model performance easily.”
Sivasubramanian told The Verge in a separate interview that often some developers don’t know if they should use a larger model for the project because they assumed a more powerful one would handle their needs. They later find out they could’ve built on a smaller one.
Model Evaluation has two components: automated evaluation and human evaluation. In the automated version, developers can go into their Bedrock console and choose a model to test. They can then assess the model’s performance on metrics like robustness, accuracy, or toxicity for tasks like summarization, text classification, question and answering, and text generation. Bedrock includes popular third-party AI models like Meta’s Llama 2, Anthropic’s Claude 2, and Stability AI’s Stable Diffusion.
While AWS provides test datasets, customers can bring their own data into the benchmarking platform so they’re better informed of how the models behave. The system then generates a report.
If humans are involved, users can choose to work with an AWS human evaluation team or their own. Customers must specify the task type (summarization or text generation, for example), the evaluation metrics, and the dataset they want to use. AWS will provide customized pricing and timelines for those who work with its assessment team.
AWS vice president for generative AI Vasi Philomin told The Verge in an interview that getting a better understanding of how the models perform guides development better. It also allows for companies to see if models don’t meet some responsible AI standards — like lower or too high toxicity sensitivities — before building using the model.
“It’s important that models work for our customers, to know which model best suits them, and we’re giving them a way to better evaluate that,” Philomin said.
Sivasubramanian also said that when humans evaluate AI models, they can detect other metrics that the automated system can’t — things like empathy or friendliness.
AWS will not require all customers to benchmark models, said Philomin, as some developers may have worked with some of the foundation models on Bedrock before or have an idea of what the models can do for them. Companies that are still exploring which models to use could benefit from going through the benchmarking process.
AWS said that while the benchmarking service is in preview, it will only charge for the model inference used during the evaluation.
While there is no particular standard for benchmarking AI models, there are specific metrics that some industries generally accept. Philomin said the goal for benchmarking on Bedrock is not to evaluate models broadly but to offer companies a way to measure the impact of a model on their projects.
PML-N’s Malik Ahmed Khan elected as Punjab Assembly speaker
PTI-backed Asad Manzoor Butt becomes LHCBA president
Mohsin Naqvi briefs Maryam Nawaz about development projects in Punjab
PSL 9: Karachi King beat Lahore Qalandars by two wickets
Sharjeel deems PTI's IMF letter a greater blunder than May 9 tragedy
Bilawal announces to form JIT to investigate attacks on PPP workers
Winter activity and political turmoil!
Government, opposition and public!
Present regime and dengue!
The repetition of history and the hidden sciences!
Whispers, rumors and rulers' narrative!
Big Blow to Sher Afzal Marwat | News Bulletin | 03 PM | 24 February 2024 | GNN
LIVE | Senior Journalist Hamid Mir Address To Ceremony | GNN
PTI,GDA & Jamaat e Islami Big Protest | Police In Action | Breaking News | GNN
Shocking Revelation about Aslam Iqbal | News Headlines | 04 PM | 24 February 2024 | GNN
شیر افضل مروت مشکل میں #gnn #sherafzalmarawat #pti #news #breaking #latest #video #update
Big Surprise to Sher Afzal Marwat from PTI | Breaking News | GNN
Pakistan 14 hours ago
PML-N, MQM agrees to work together
Pakistan 13 hours ago
Newly elected members of Sindh Assembly takes oath
Sports 1 day ago
PSL-9: Peshawar Zalmi set 180-run target for Multan Sultans
Regional 18 hours ago
Netflix’s live-action Avatar: The Last Airbender is everything fans hoped it would be
Pakistan 14 hours ago
Court extends interim bail of Sheikh Rashid in case of accusing Zardari
Pakistan 1 day ago
BISP launches annual reports for FY 2022-23 on key initiatives
Pakistan 2 days ago
Imran Khan’s sisters reprimand Musarrat Jamshed Cheema
Pakistan 1 day ago
MQM to meet Shehbaz Sharif today