I Challenged My AI Clone to Replace Me for 24 Hours | WSJ

– Today we're going to
create an avatar that looks and moves like me. (camera snapping) We begin by smiling into the camera. (dramatic music)
I love to smile. Smiling's my favorite. I'm breathing gently for a short second. Is this what breathing looks like? (dramatic music) I'm Joanna Stern, and I'm
excited to host this video. No, I am the real Joanna. Okay, so I cloned myself. Kinda. Howdy. Why? Well, the latest AI
tools that generate text and images already make it
hard to tell the difference between what's real and what's fake. What's coming next with AI-generated voice and video is gonna blur
the lines even more. So I came up with a challenge. Can I replace myself with AI for the day? Yes, I came up with four challenges to see if AI me could sub in for real me so real me had more time for me things. (tranquil music) Or at least that's how I wanted it to go.

– Still a little creeped out that I'm looking at a frozen Joanna. – Okay, let's do this. – Scene three, take two, calibration. (board clapping) – Before we get into the challenges, let's talk about my AI avatar, which was made by a
startup called Synthesia. Going to make my avatar. At a professional studio in New York, the company recorded me doing
a series of head movements. I feel like I'm at the eye doctor. Okay. And reading through a rather
odd pre-written script.

Positive thinking will help you believe in your self and fill you with
self-esteem and confidence. After that, I headed to an audio studio where I recorded another
script for about an hour. My name is Joanna Stern
and I hereby consent to this audio recording
to create a custom voice. The company took that all
and used it as training data and ran it through their
AI neural networks. (dramatic music)
(text buzzing) Hello, Joanna. You don't mind if I call you Joanna? Do you? Okay. So The voice isn't the best. A tool called ElevenLabs
produced something better after my producer Kenny uploaded two hours of my previous recordings. I am the real Joanna. I am the real Joanna. I am the real Joanna. Both Synthesia and
ElevenLabs work similarly.

Type in anything and AI Joanna
just says it right back. Synthesia is aimed at companies that want to make internal videos. It charges at least $1,000
to create a custom avatar. Creating a voice clone with ElevenLabs is $5 a month. Challenge one: phone calls. I happened to have a
call scheduled that day with Evan Spiegel, the CEO of Snap. The company recently released My AI, a chatbot within the popular app. Hey Evan, it's Joanna. Do you worry that if we
chat with AI all day, we'll stop talking to our real friends? – [Evan] Definitely not
what we've been seeing. I think that's one of the real benefits of our sort of testing
and learning approach. So far, I think if anything, it's gonna become a
conversation enhancement and improve the way
that people communicate with their friends and family. – Did you think by any
chance that my question to you was generated by an AI voice? (Evan chuckling) – [Evan] No.

No. I mean, the first word or two
was a little bit of a giveaway but I thought maybe you
were extra serious today. (Joanna chuckling) – [Joanna] Even my own
sister was pretty fooled when I called her about her dead fish. – [Julia] Hello? – Hey, Jules. I just heard about Swimmy Dimi and I wanted to let you know
how sorry I am for your loss. Did you think it was me? – [Julia] At first, yes. And then no. Like it sounds, it's obviously
exactly like you, but just with the fact that like, it
doesn't pause for talking back. – Challenge one: pass. Challenge two: create a TikTok. I asked ChatGPT to write a TikTok script in the voice of Joanna Stern
about an obscure iOS 16 tip. The hardest thing was getting
ChatGPT to write the truth. It just made stuff up. Finally, I got a good one. Although the writing
certainly was not very me. I pasted the script into Synthesia, put a green screen behind
my avatar and exported it. While the WSJ TikTok team edited, I. (pleasant piano music)
(Joanna snoring) I was pretty impressed
with the final TikTok.

TikTok fam, it's Joanna
Stern, your iOS wizard. Today we're unearthing the hidden world of back tap gestures. I love that I did not have to shoot this. I did not have to put on nice clothes, do my hair, do my makeup, say these lines. But TikTok was less impressed. They picked up on the fact that the avatar never moves its arms, that the mouth movements
don't always match the audio and that there's little facial expression. Synthesia has already
started to improve a lot of this in beta versions of its avatars. – Look, I can nod my head. (dramatic music) – Challenge two: fail. Challenge three: bank biometrics.

Instead of asking security questions, some banks use your
voice to confirm it's you before transferring you
to a customer service rep. – [System] This call will
be monitored and recorded and your voice may be
used for verification. Please speak your first and last name, followed by your mailing address. – Joanna Stern.
(beeping sound) – [Nikki] This is Nikki with
Chase credit card services. – It worked. Chase confirmed the voice
and put me straight through to a service rep. No additional questions asked. Later in the day, I asked
our intern Slav to try to do his best impression of
me to see what would happen. – [System] Please speak
your first and last name, followed by your mailing address. (beeping sound) – Joanna Stern.
(beeping sound) – [System] Please enter the
last three digits printed on the signature panel
on the back of your card. – See, in Slav's case, the voice biometric system didn't buy it. It asked for further verification. When I reached out to
Chase, a spokeswoman said, "We use voice biometrics,
along with a variety of other methods to authenticate
customers who call us." She added that to complete requests, customers must provide
additional information.

Challenge three: pass. Challenge four: video calls. I asked ChatGPT to generate
some generic meeting phrases and exported videos of
my avatar saying them. Then I installed some software on my Mac to pump that video
into my Google Meet calls. That sounds good. – [Caller 1] Oh, you're muted, Joanna. My God, is this the real Joanna? – Yeah, this looks like a fake. It sounds good. – She looks, yeah, what is happening here? – How did you know that it wasn't me? – It looked like a
hologram version of you. – [Caller 2] It was the posture for me. – She also didn't make any jokes. – Challenge four: big fail. So what did we learn today? We learned that video clones aren't going to fool anyone yet but AI voices are quite good. We also learned that while you
could use these to save time, people could also misuse them.

Do I wanna avoid going
to the studio some days? Yep. Do I fear scammers using
our voices to call banks or our families? Yep. Synthesia says it requires
those creating avatars to give verbal consent. ElevenLabs requires you
check a box saying you have permission to use the voice and the company says it's
capable of identifying its voices if they are misused. Either way, it means we're
all going to have to be on high alert to tell
the real versus the AI. – And finally, stay human, everyone. Good luck. I am inevitable. (fingers snapping).

As found on YouTube