To power its machine learning and computer vision technology, GumGum needs a lot of training data. To meet its data needs, about two years ago the company turned to Figure Eight, a crowdsourcing machine learning annotation vendor.
Acquired by Appen, another crowdsourcing machine learning annotation company, in April 2019, Figure Eight provides training data to a variety of similar vendors. Figure Eight relies on a network of contributors to annotate huge amounts of data.
The contributors are trained, although they are mostly not data scientists, and are screened for security purposes. Their large contributor network enables Figure Eight to train data at scale, as well as continue to review annotated data while a job is running.
Getting training data
Before using Figure Eight, GumGum employed full-time staff for machine learning annotation, said Erica Nishimura, data curator at GumGum. That worked, but it was costly and, at times, slow. With large amounts of data, it could take months to get useable training data. Besides, the staff could only work in English, but GumGum has clients internationally.
Figure Eight, meanwhile, works in a number of languages. At the time, Nishimura said, it was one of the only companies that worked in Japanese. As GumGum has a thriving Japanese division, the language support was one of the main reasons it chose Figure Eight.
Scalability, said Lane Schechter, product manager at GumGum, was the other reason GumGum chose Figure Eight.
Working with Figure Eight has increased GumGum’s data capacity tenfold, Schechter said. Also, instead of taking months to get completed machine learning annotation, it now happens in about a week.
Still, that’s not to say that working with Figure Eight has been without its share of problems.
One of the biggest challenges has been communicating directly with Figure Eight’s crowdsource contributors, Nishimura said.
At times, the contributors have had trouble understanding exactly what GumGum wants, but, because there is no way to directly interact with the contributors, Nishimura said it is hard to know if the contributors are having problems, or what they might be.
The best GumGum can do is put in a message, Nishimura said, but there is no way to alert each contributor to the message. Besides, a single message isn’t the same as having a conversation, she added.
While she was unsure if other similar crowdsourcing machine learning annotation companies have a better way to communicate with contributors, Nishimura said some other companies have their own checkers, who do spot-checks on completed annotations.
“It’s one more step to ensure quality,” Nishimura said. But, she added, the prices of those services are generally higher than those of Figure Eight’s.