Like millions of other people, the first thing Mark Humphries did with ChatGPT when it was released in late 2022 was ask it to perform parlor tricks, like writing poetry in the style of Bob Dylan — which, while very impressive, did not seem particularly useful to him, a historian studying the 18th-century fur trade. But Humphries, a 43-year-old professor at Wilfrid Laurier University in Waterloo, Canada, had long been interested in applying artificial intelligence to his work. He was already using a specialized text recognition tool designed to transcribe antiquated scripts and typefaces, though it made frequent errors that took time to correct. Curious, he pasted the tool’s garbled interpretation of a handwritten French letter into ChatGPT. AI corrected the text, fixing all the Fs that had been misread as an S and even adding missing accents. Then Humphries asked ChatGPT to translate it to English. It did that, too. Maybe, he thought, this thing would be useful after all.
USC rides 6-game win streak to No. 7 in AP poll
Six straight wins by USC have bumped the Trojans up to No. 7 in the Associated Press women's basketball poll, while South Carolina is still the unanimous No. 1.
USC swept Oregon and then-No. 11 Oregon State over the weekend. The Trojans moved into a tie for second place in the Pac-12 Conference, two games behind Stanford. They host Colorado and Utah this weekend.
The six teams in front of USC didn't change, with South Carolina leading the way, as it has since the regular season began. The Gamecocks received all 35 votes from a national media panel. South Carolina was tested in both its games last week, rallying to beat Tennessee and Georgia. Coach Dawn Staley's team trailed at halftime at home against Georgia on Sunday before winning by 14 points.
Ohio State was right behind South Carolina, marking the first time in seven weeks that a No. 2 team held its place for two consecutive polls. No. 3 Stanford, Iowa, Texas and NC State followed the Buckeyes.
Virginia Tech climbed four spots to eighth after beating Duke and Louisville. The Hokies have won nine in a row and sit in first place in the Atlantic Coast Conference, a game in front of Syracuse.
Oregon State moved up to ninth despite the loss to USC after beating then-No. 9 UCLA.
Kansas State fell three places to No. 10, while Colorado and UCLA also dropped three places.
GAME OF THE WEEK
With the NCAA career scoring record in the rearview mirror, Caitlin Clark leads Iowa into Indiana for a key Big Ten Conference matchup. Iowa is tied for second, a game behind Ohio State. The Hawkeyes beat the Hoosiers by 27 points at home last month. Expect Thursday's rematch to be a bit more competitive.
EXCITEMENT OUT WEST
It was an exciting weekend in the Pac-12 Conference, with Oregon State topping UCLA on a shot at the buzzer as well as Utah beating Colorado in the last second as well. The conference is still tops in the poll, with five teams in the first 12 and six ranked in the Top 25 overall.
The ACC and Big 12 are next with five teams each. The Big 12 has three teams and the Big East and SEC each have two. The Ivy and West Coast each have one.
The NCAA had its first reveal last Thursday of the top 16 teams at that point. South Carolina, Ohio State, Stanford and Colorado were the 1-seeds. The Buffaloes went on to lose after that. The next reveal will be Feb. 29.
IVY LEAGUE SHOWDOWN
No. 25 Princeton has won 15 straight games and sits a game in front of Columba in the Ivy League standings. The Tigers won the first meeting last month, and the two teams play in New York on Saturday with first place in the conference on the line. They shared the regular-season title last year, the first in Columbia's history.
Sports betting hits record $11B in 2023 revenue
The American Gaming Association reported Tuesday a record $10.92 billion in 2023 for the sports betting industry, up 44.5% from 2022.
The huge year for the industry represented a 44.5% year-over-year increase from 2022, which previously held the record. A handle of $119.84 billion (a 27.8% year-over-year increase) combined with an increased year-over-year sportsbook win percentage of 9.1% (up from 8.1% in 2022) contributed to the record.
AGA notes that these figures are all without Arizona (November) and Kentucky (November and December) reporting their most recent data.
Another key factor was the addition of five new legal betting states -- Kentucky, Maine, Massachusetts, Nebraska and Ohio -- all going online in 2023. Ohio has quickly established itself as a sports betting hotbed, bringing in $936.6 million to rank as the fourth-highest-earning state in the country. The five new states combined to bring in $1.49 billion last year.
Meanwhile, New York maintained its dominance at the top of the leaderboard by accumulating $1.697 billion in 2023 revenue, while New Jersey ($1.007 billion) and Illinois ($1.002 billion) each eclipsed $1 billion in annual revenue for the first time.
Aside from the yearlong numbers, the sports betting industry also posted a record quarter by bringing in $3.41 billion in revenue in the fourth quarter of 2023 -- a 30.8% increase from the fourth quarter of 2022 and a 19.6% increase from the record set in the first quarter of 2023.
Another quarterly record could be on tap. The AGA projected that Americans would bet $23.1 billion on Super Bowl LVIII, and Nevada ($185.6M) and New York ($162.2M) already reported enormous numbers from the game. With North Carolina launching legal online sports betting in early March -- just in time for March Madness -- a new record quarter for Q1 2024 seems all but assured.
While another record year could also be in store for 2024, the industry is wary of the notable slowdown of states legalizing.
"I think we're in some ways victims of our own success over the past five years in that we now have 38 states plus DC that have already taken the step of legalizing and regulating sports betting, so there's a lot fewer states left on the board," AGA senior vice president of government relations Chris Cylke told reporters Tuesday. "Some of them have pretty significant political challenges in terms of getting sports betting itself enacted.
"When we look at the map, there could be potentially two or three states that have some activity and get it across the finish line this year. There could be zero, which would be disappointing, but that's where we are right now with the progress that we've made over the past almost six years."
Overall, the American gambling industry -- which also includes land-based casino gambling and internet gambling -- had a record year, posting $65.52 billion in 2023 revenue, a 10% increase from 2022's record. Land-based gambling alone brought in $50.02 billion, a commanding 75.3% of all revenue.
AGA also touts that gambling taxes generated $14.4 billion for state and local governments in 2023.
How AI can make history
A scholar of 18th-century history was overwhelmed by piles of letters, journals, and legal documents. He tried using AI on a whim — and found it surprisingly useful.
For Humphries, AI tools held a tantalizing promise. Over the last decade, millions of documents in archives and libraries have been scanned and digitized — Humphries was involved in one such effort himself — but because their wide variety of formats, fonts, and vocabulary rendered them impenetrable to automated search, working with them required stupendous amounts of manual research. For a previous project, Humphries pieced together biographies for several hundred shellshocked World War I soldiers from assorted medical records, war diaries, newspapers, personnel files, and other ephemera. It had taken years and a team of research assistants to read, tag, and cross-reference the material for each individual. If new language models were as powerful as they seemed, he thought, it might be possible to simply upload all this material and ask the model to extract all the documents related to every soldier diagnosed with shell shock.
“That’s a lifetime’s work right there, or at least a decade,” said Humphries. “And you can imagine scaling that up. You could get an AI to figure out if a soldier was wounded on X date, what was happening with that unit on X date, and then access information about the members of that unit, that as historians, you’d never have the time to chase down on an individual basis,” he said. “It might open up new ways of understanding the past.”
Improved database management may be a far cry from the world-conquering superintelligence some predict, but it’s characteristic of the way language models are filtering the real world. From law to programming to journalism, professionals are trying to figure out whether and how to integrate this promising, risky, and very weird technology into their work. For historians, a technology capable of synthesizing entire archives that also has a penchant for fabricating facts is as appealing as it is terrifying, and the field, like so many others, is just beginning to grapple with the implications of such a potentially powerful but slippery tool.
AI seemed to be everywhere at the 137th annual meeting of the American Historical Association last month, according to Cindy Ermus, an associate professor of history at the University of Texas at San Antonio. She chaired one of several panels on the topic. Ermus described her and many of her colleagues’ relationship to AI as that of “curious children,” wondering with both excitement and wariness what aspects of their work it will change and how. “It’s going to transform every part of historical research, from collection, to curation, to writing, and of course, teaching,” she said. She was particularly impressed by Lancaster University lecturer Katherine McDonough’s presentation of a machine learning program capable of searching historic maps, initially trained on ordnance surveys of 19th-century Britain.
“It’s going to transform every part of historical research, from collection, to curation, to writing, and of course, teaching.”
“She searched the word ‘restaurant,’ and it pulled up the word ‘restaurant’ in tons of historical maps through the years,” Ermus said. “To the non-historian, that might not sound like a big deal, but we’ve never been able to do that before, and now it’s at our fingertips.”
Another attendee, Lauren Tilton, professor of liberal arts and digital humanities at the University of Richmond, had been working with machine learning for over a decade and recently worked with the Library of Congress to apply computer vision to the institution’s vast troves of minimally labeled photos and films. All archives are biased — in what material is saved to begin with and in how it is organized. The promise of AI, she said, is that it can open up archives at scale and make them searchable for things the archivists of the past didn’t value enough to label.
“The most described materials in the archive are usually the sort of voices we’ve heard before — the famous politicians, famous authors,” she said. “But we know that there are many stories by people of minoritized communities, communities of color, LGBTQ communities that have been hard to tell, not because people haven’t wanted to, but because of the challenges of how to search the archive.”
AI systems have their own biases, however. They have the well-documented tendency to reflect the gender, racial, and other biases of their training data — the fact that, as Ermus pointed out, when she asked GPT-4 to create an image of a history professor, it drew an elderly white man with elbow patches on his blazer — but they also display a bias that Tilton calls “presentism.” Because the vast preponderance of training data is scraped from the contemporary internet, models reflect a contemporary worldview. Tilton encountered this phenomenon when she found image recognition systems struggled to make sense of older photos, for example, labeling typewriters as computers and their paperweights as their mice. These were image recognition systems, but language models have a similar problem.
Impressed with ChatGPT, Humphries signed up for the OpenAI API and set out to make an AI research assistant. He was trying to track 18th-century fur traders through a morass of letters, journals, marriage certificates, legal documents, parish records, and contracts in which they appear only fleetingly. His goal was to design a system that could automate the process.
One of the first challenges he encountered was that 18th-century fur traders do not sound anything like a language model assumes
One of the first challenges he encountered was that 18th-century fur traders do not sound anything like a language model assumes. Ask GPT-4 to write a sample entry, as I did, and it will produce lengthy reflections on the sublime loneliness of the wilderness, saying things like, “This morn, the skies did open with a persistent drizzle, cloaking the forest in a veil of mist and melancholy,” and “Bruno, who had faced every hardship with the stoicism of a seasoned woodsman, now lay still beneath the shelter of our makeshift tent, a silent testament to the fragility of life in these untamed lands.”
Whereas an actual fur trader would be far more concise. For example, “Fine Weather. This morning the young man that died Yesterday was buried and his Grave was surrounded with Pickets. 9 Men went to gather Gum of which they brought wherewith to Gum 3 Canoes, the others were employed as yesterday,” as one wrote in 1806, referring to gathering tree sap to seal the seams of their bark canoes.
“The problem is that the language model wouldn’t pick up on a record like that, because it doesn’t contain the type of reflective writing that it’s trained to see as being representative of an event like that,” said Humphries. Trained on contemporary blog posts and essays, it would expect the death of a companion to be followed by lengthy emotional remembrances, not an inventory of sap supplies.
By fine-tuning the model on hundreds of examples of fur trader prose, Humphries got it to pull out journal entries in response to questions, but not always relevant ones. The antiquated vocabulary still posed a problem — words like varangue, a French term for the rib of a canoe that would rarely appear in the model’s training data, if ever.
After much trial and error, he ended up with an AI assembly line using multiple models to sort documents, search them for keywords and meaning, and synthesize answers to queries. It took a lot of time and a lot of tinkering, but GPT helped teach him the Python he needed. He named the system HistoryPearl, after his smartest cat.
He tested his system against edge cases, like the Norwegian trader Ferdinand Wentzel, who wrote about himself in the third person and deployed an odd sense of humor, for example, writing about the birth of his son by speculating about his paternity and making self-deprecating jokes about his own height — “F. W.’s Girl was safely delivered of a boy. - I almost believe it is his Son for his features seem to bear some resemblance of him & his short legs seem to determine this opinion beyond doubt.” This sort of writing stymied earlier models, but HistoryPearl could pull it up in response to a vaguely phrased question about Wentzel’s humor, along with other examples of Wentzel’s wit Humphries hadn’t been looking for.
The tool still missed some things, but it performed better than the average graduate student Humphries would normally hire to do this sort of work. And faster. And much, much cheaper. Last November, after OpenAI dropped prices for API calls, he did some rough math. What he would pay a grad student around $16,000 to do over the course of an entire summer, GPT-4 could do for about $70 in around an hour.
“They’re still talking about the technology as if it is a theoretical thing without the full understanding that it poses a very real, existential threat to our whole raison d’être as higher educators.”
“That was the moment where I realized, ‘Okay, this begins to change everything,’” he said. As a researcher, it was exciting. As a teacher, it was frightening. Organizing fur trading records may be a niche application, but a huge number of white collar jobs consist of similar information management tasks. His students were supposed to be learning the sorts of research and thinking skills that would allow them to be successful in just these sorts of jobs. In November, he published a newsletter imploring his peers in academia to take the rapid development of AI seriously. “AI is simply starting to outrun many people’s imaginations,” he wrote. “They’re still talking about the technology as if it is a theoretical thing without the full understanding that it poses a very real, existential threat to our whole raison d’être as higher educators.”
In the meantime, though, he was pleased that his tinkering had resulted in what he calls a “proof of concept”: reliable enough to be potentially useful, though not yet enough to fully trust. Humphries and his research partner, the historian Lianne Leddy, submitted a grant to scale their research up to all 30,000 voyageurs in their database. In a way, he found the labor required to develop this labor-saving system comforting. The largest improvements in the model came from feeding it the right data, something he was able to do only because of his expertise in the material. Lately, he has been thinking that there may actually be more demand for domain experts with the sort of research and critical assessment skills the humanities teach. This year he will teach an applied generative AI program he designed, run out of the Faculty of Arts.
“In some ways this is old wine in new bottles, right?” he said. In the mid 20th century, he pointed out, companies had vast corporate archives staffed by researchers who were experts, not just in storing and organizing documents, but in the material itself. “In order to make a lot of this data useful, people are needed who have both the ability to figure out how to train models, but more importantly, who understand what is good content and what’s not. I think that’s reassuring,” he said. “Whether I’m just deluding myself, that’s another question.”
Sports 13 hours ago
PSL 9: Karachi Kings beat Peshawar Zalmi by seven wickets
Regional 23 hours ago
US officials are always talking about “deterring” Iran. What does that really mean?
Pakistan 2 days ago
PPP to appoint Murad Ali Shah as CM Sindh again
Pakistan 19 hours ago
IHC rejects DC Islamabad unconditional apology's plea
Sports 1 day ago
PSL 9: Multan Sultans beat Islamabad United by five wickets
Sports 2 days ago
PSL 9: Qalandars face another defeat as Gladiators successfully chase 188-run target
Sports 1 day ago
Messi, Inter Miami consensus betting MLS faves
Health 2 days ago
New virus in Karachi as weather changes