Journey of Voice


Ten years ago, I used to open Facebook to share what’s on my mind. Mostly writing my favorite parts from a pink floyd song. This continued for a while until I became comfortably numb 😐

Later, I saw friends are uploading images of trips, moments, moods, selfies. I also tried to follow the course for sometime but not for long. Maybe I liked saying more than stills. I did it for some years though, at least tried to keep the profile pic updated. But now, whenever I open facebook, I end up watching streams videos and I have no time and motivation to create videos. I’ve become a content consumer. I just can’t resist those amazing pet videos.

My behavior is not unique from other people on internet. Their choices might be different.

Facebook is no strange place on internet. Consumption of media on internet is all time high and Facebook is trying to keep itself up with this trend. Internet users spend significant amount of time there and end up consuming/generating content. I just took facebook as an example. Here is the journey of internet from its preferred content perspective.

The Personal Computer(PC) Era

The PC era content on internet was largely text. Html, the fundamental building block of a web page was also built for text. It’s not very difficult to find out reasons.

It was easy to generate text and we already had invented keyboards with typewriters.

Early internet bandwidth was very low, and text is very small in size to send over the network.

Once there was enough content, we saw applications to manage it for users. Mostly to ease out interaction between content and user. Searching these text documents and ranking them for great results was the great idea behind Google. Blogging, tweeting and sharing status became one of the first internet age activities.

The Mobile First Era

Mobile phone users found it hard to generate long form text content because of the limited interface of early mobile phones. Even touch screens phones were not suitable for it. But mobile users loved posting images. Again, the reasons are:

It became easy to generate digital images when mobile phones came up with integrated cameras.

Internet bandwidth and speed were able to support download and upload of images.

For images, as content, we saw many applications being built to manage, interact and share them easily. A range of applications such as Flicker, Photoshop, Instagram, Canva, 9Gag, and many more have business only with images. User consumption behavior also gave rise to a new type of image : memes and gifs. They truly are the lifeline of internet communities.

Yeah.. this is getting too predictable, Are we seeing something here?

Whenever any content type has become easy to generate and upload on internet, either because of the user behavior or new interfaces, there comes a wave of applications to make its interaction and sharing easy.

These applications also gives rise to a new market by tapping into unseen user behavior for any content. Who could have predicted selfies and mems as a thing?

Voice First Era

If we think on the same lines for voice as a content which is locked in monolithic audio and video media files. We can gain some interesting insights.

Voice creation doesn’t require anything fancy. Just a microphone which is also there in mobile phones.

It’s easy to speak. Easier than typing. Agree?

Internet bandwidth is not an issue anymore to stream high quality audios and videos.

Rapid growth of voice first interfaces such as Alexa and Google Home shows that people are embracing these interfaces and it is predicted to consume more and more voice content by users.

Podcasts are slowly getting into mainstream because of “on demand” user behavior.

According to our market study, there are 150,000 semi-pro podcasters in US alone and they are generating on an average 4 podcasts a month. Each podcast is of average 30min long. You can count the hours of new voice content being generated every month only by podcasters.

So what are the challenges with voice as a content.  Specially casual user generated voice content. Why it is not following the trend? We know a few reasons from infrastructure point of view:

Voice content is locked in big monolithic audio & video files.

It is not available in easy to use open platforms. Podcasts are available in iTunes and other podcast apps but it is not possible to share, search, create bytes size contents from it.

There are no easy to use creative tools for voice. Current tools do not differentiate between voice and music. Music is a very different type of content than voice. Music production tools are complicated to learn and master. Adobe Audition, Audacity, Garageband doesn’t even sound simple. They are only for professionals.

Can we make internet infrastructure easy to use for this special content? Will users be able to just do ctrl+C ctrl+V with any audio/video ever?

We at Spext are working hard on these challenges. With the help of speech-to-text technology, we are making interaction with voice content very very easy and we mean it!

Spext also helps in unlocking voice content from big media files and make it easy to  disseminate.

Our goal is to make interaction with voice as easy as text on internet. We believe that voice  can be a very special content and very intimate for users. It will have its own journey and we want to part of that journey.

Voice has some special properties. It is intimate yet ephemeral in real life. Every voice has its own identity. Will voice tweets be more amazing? What would be the nature of voice first apps? Currently we have no firm views on user’s behavior with voice content.

Leaving you with these question, I would love to know your views on voice.

Co-Founder and CTO of Spext. Wanders in deep thoughts of science, spirituality and human nature. Often goes to trek Himalayan trails.

About the author

Ashutosh Trivedi

Co-Founder and CTO of Spext. Wanders in deep thoughts of science, spirituality and human nature. Often goes to trek Himalayan trails.

Add comment

Leave a Reply