The Making of Rang-E-Yaad: A Cinematic AI Music Video
How SBN Media Made an AI Music Video With Flawless Character Lip Sync With Song Tune, Consistent Characters, and a Memorable Tune
SBN MEDIA TEAM
4/15/20268 min read


Where It Began: A Decision to Make Something Unique
At Sixteen By Nine (SBN) Media, we have spent years working across the full range of video production, from corporate films and brand campaigns to film marketing assets and product advertisements. Whenever our team needs a creative breather, we sit down and ask ourselves: What is something original we can create today by experimenting with AI video tools?
Rang-E-Yaad was born from this question.
We decided to produce a music video that would be completely AI-generated in terms of visuals. No live shoot. No camera operator on location. No lighting rig. Just a song, a story, and a team of filmmakers working with AI tools the same way a seasoned director works with a crew, with intention, with craft, and with full creative control.
This blog is our honest account of how that happened. What we built, how we built it.
Watch AI Music Video: Rang-E-Yaad
The Song Came First: Human Writing at the Core
Rang-E-Yaad’s lyrics were written by a human writer, and the music composition followed from those lyrics. This matters because the emotional intelligence of the song is entirely human.
The song, titled Rang-E-Yaad, opens with:
इश्क़-ए-जान कोई गिला नहीं, दिल है कहीं और जान कहीं। एक दूजे के ख़ातिर हैं कहाँ, रंग-ए-याद फिर आई नहीं।
These lines carry the weight of a love that has quietly separated. The heart is somewhere. The soul is somewhere else. And the colour of memory, rang-e-yaad, refuses to return. This poetry relies on absolute precision to express profound emotion.
The title itself, Rang-E-Yaad, translates to "the colour of memory." It became the thematic anchor for every visual decision we made. Memory is not just an emotion in this song. It is a visual phenomenon. We had to find a way to render that on screen.
Building the Story: A Relatable Reason for Distance
The song speaks of two people separated, of a love that has not died but has been quietly shelved. The question was: why are they apart? What is the human reason that makes this separation feel earned and not melodramatic?
We landed on one of the most universal situations a person faces, the moment when life choices, career, ambitions, or the next chapter pull someone away from the person they love. Not a dramatic breakup. Not a fight. Simply two people whose paths moved in different directions at a defining moment.
That relatability became our guiding principle for the narrative. We wanted the viewer to see the story and immediately understand the emotional geography without it needing to be spelt out.
Music Production: AI as Composer and Sound Architect
With the lyrics and story locked, the audio production began. The music, the composition, the arrangement, and the sonic texture of the song were produced entirely using AI tools. We used Suno-AI Music Generator to create this song. This is where AI served as a genuine collaborator, translating the emotional mood of the lyrics into a memorable song and tune that stays in the minds of listeners long after the track ends.
The instrumentation supports the quiet longing of the song rather than overwhelming it. Getting this right required iterative work, listening carefully to how each musical pass matched the pacing of the lyrics and the emotional arc of the story we had planned.
Storyboarding with AI: Building the Visual Plan Frame by Frame
Before we generated a single video clip, we built the storyboard. This is a principle we hold firmly at SBN Media: the effort you put in pre-production determines the quality of the final video. In our experience across hundreds of productions, the quality of the output is almost entirely determined before any camera rolls or, in this case, before any video generation prompt is submitted.
For Rang-E-Yaad, we used image generation models, including Nano Banana and Kling, to build out the storyboard visuals. Each storyboard frame was not a rough sketch but a directorial decision. Shot size. Character position. The quality of light in the frame. The emotional register of the character's face. All of this was determined at the storyboard stage so that when we moved to video generation, we were executing a plan, not exploring randomly.
This storyboard-first approach is what separates disciplined AI filmmaking from prompt guessing. We knew what we wanted each shot to accomplish before we prompted AI video tools to generate these shots.
Shot Generation: Expert Use of Veo 3.1
The visual generation for Rang-E-Yaad was done using Google Veo 3.1, and this is where our cinematographic framework became critical.
AI video tools are only as good as the instructions they receive. The tool does not automatically understand that a scene of quiet heartbreak calls for a certain emotion, or that a close-up of the protagonist's face should be framed with deliberate negative space to communicate emotional isolation. These are decisions a skilled Director and DOP make instinctively. In AI video production, our experts communicate these nuances to AI video tools through optimised prompts.
For every shot in Rang-E-Yaad, our prompts addressed these specific elements:
Lensing: What focal length is implied by this shot? A wide establishing shot reads very differently from a compressed telephoto close-up. We described the optical character of each shot in our prompts, guiding the AI toward the visual language we were building.
Composition: Where does the character sit in the frame? Is the horizon line deliberate? Is there foreground layering? Composition in cinema is not accidental, and it should not be accidental in AI-generated cinema either.
Lighting and Exposure: The mood of Rang-E-Yaad required a consistent light palette, soft, slightly muted, with the warmth of memory and the cool undertone of distance. Each prompt carried a lighting direction so the AI understood what kind of world this was.
Emotional Register of the Scene: Beyond technical elements, we had to create the correct emotional tone for each shot. This needed a lot of generations and iterations. But we didn’t stop until we got the right shots. This is the harder part of AI filmmaking, the part where you dig deep and keep iterating and reiterating until you get the desired output.
The Lip Sync Achievement: Making an AI Character Flawlessly Lip Sync With the Song Tune
Making an AI character lip-sync to a song tune is fundamentally different, and exponentially harder, than making a character speak standard dialogue.
In spoken dialogue, if an AI character pauses for a fraction of a second too long, it can easily pass as a natural, dramatic breath. In a music video, the audio track is absolute and inflexible. The tempo, rhythm, and beat are locked. You cannot bend the audio to fit the video.
While making the protagonist sing Rang-E-Yaad in the music video, we matched his lip movement, emotions, and singing performance to every pause and nuance of the song tune perfectly.
For this project, we couldn't just describe a scene and hope the lip-sync would naturally follow. Guided by rigorous prompting, each lyrical line was treated as a distinct, highly controlled unit.
The opening lines, ‘इश्क़-ए-जान कोई गिला नहीं,’ required a delivery that was composed and precise. However, as the song progressed into deeper vulnerability, the physical performance had to shift. We mapped the generation not just to the phonetics, but to the emotional core of the song tune itself.
The final result is a performance that holds up to the closest scrutiny. The character’s mouth movements, pauses, and emotional expressions feel completely in tune with the song.
Character Consistency: Emphasis on Continuity
Anyone who has worked with AI video generation understands that continuity does not come automatically. The AI has no memory of what it built in the previous shot. If you are not deliberate about maintaining your character's appearance, you can end up with a different person in every scene.
We handled character consistency in Rang-E-Yaad through rigorous work in the storyboarding phase, followed by expert prompting in the video generation phase. Every prompt that featured the protagonists carried a consistent description of the character, physical appearance, clothing, and the quality of presence we were building across the narrative. We treated these character parameters the way a costume department and makeup department treat their continuity sheets on a live set. Nothing was left to chance or assumed to carry over.
The output across the video maintains coherent, recognisable central characters whose visual identity is stable throughout. This is a result of applying traditional filmmaking discipline to AI videos.
Emotional Authenticity: Going Beyond the Surface Performance
The song Rang-E-Yaad deals with love, separation, quiet longing, and the kind of bittersweet acceptance that only comes with time. These are emotions that can look hollow very quickly if they are not guided carefully.
The line हाँ खुश हूँ मैं, अब ये कहता हूँ ("Yes, I am happy, this is what I say now") is a performer's moment. It is not simply a statement of happiness. It is a performance of happiness while something else entirely is happening beneath the surface. Getting the AI to render that kind of emotional layering required us to describe the emotion in filmmaking terms, not in vague feeling terms.
The love scenes and the separation sequences were handled with the same level of emotional attention. We wanted nothing in this video to look performed or surface-level. The emotions needed to feel inhabited, the way they do when a good actor disappears into a role.
The Post-Production Assembly: Bringing the Story to Life
With all shots generated, the post-production phase brought everything together. This is where the editorial instinct took over, reviewing every shot against the lyrics and the story, sequencing the visual narrative so it moved in sync with the emotional arc of the song, and making decisions about pacing.
Audio and visuals were carefully matched. The editing rhythm followed the musical phrasing, letting the song breathe where it needed to and cutting with the beat where energy was required. This is a craft that has nothing to do with AI and everything to do with experienced editorial sensibility, with knowledge of filmmaking and cinematography.
What Rang-E-Yaad Demonstrates About AI Music Video Production
The finished video is evidence of what AI video production looks like when it is led by filmmakers who understand both cinematic craft and the technical requirements of working with generative tools.
The lip sync is accurate and expressive. The characters remain consistent across every scene. The emotions are layered and genuine rather than superficial. The cinematography follows deliberate compositional and lighting choices. The music is melodious, and the lyrics carry the kind of human feeling that no AI wrote.
At SBN Media, our position on AI video production has always been clear: AI does not replace filmmaking expertise. It extends it. The tools are extraordinary, but they respond to the quality of direction they receive. Rang-E-Yaad is a project where that direction was detailed, disciplined, and grounded in years of video production experience.
FAQs: AI Music Video Production at SBN Media
How long does an AI music video production take?
For a project like Rang-E-Yaad, the timeline from lyrics finalisation to completed video runs approximately 2 weeks, factoring in story development, storyboarding, shot generation, and post-production.
Is the music in Rang-E-Yaad AI-generated?
Yes. The lyrics are human-written, and the musical composition and arrangement were produced using AI tools. The result is music that serves the emotional character of the song.
Which AI tools were used for the visuals?
We used Google Veo 3.1 for video generation, and Nano Banana and Kling for storyboard image generation. The prompts were developed in-house by our production team.
How did you maintain character consistency across the music video?
Through a continuity framework embedded in every prompt. We treated character appearance parameters the way a live production treats costume and makeup continuity, as a non-negotiable constant across all shots.
Can this approach be applied to brand films or advertisements?
Absolutely. The disciplines used in Rang-E-Yaad, storyboarding, cinematic prompt engineering, character consistency, and emotional direction, are directly applicable to commercial video production. We apply the same rigorous quality control framework to AI brand films and product advertisements.
What makes your approach different from simply using AI tools?
The filmmaking framework behind the prompts. AI tools respond to the quality and specificity of the direction they receive. Our team brings production experience that shapes every prompt as a directorial decision, not a keyword guess. That is what produces consistently better results.
What This Means for Brands and Content Creators
Rang-E-Yaad is a music video, but the disciplines it embodies are directly applicable to brand films, product campaigns, OTT content, and any video project where creative quality and emotional resonance matter.
The ability to produce a music video with believable lip sync, character consistency, cinematic shot design, and emotional depth using AI tools means that the cost and logistical barriers that once limited ambitious video production have fundamentally shifted. What required a large crew, multiple shoot days, and significant location costs can now be approached differently when AI is paired with experienced filmmaking direction.
This is the exact opportunity Sixteen By Nine (SBN) Media is offering to major record labels and music studios. We are ready to take your artists' tracks and turn them into visually spectacular music videos, bypassing traditional production bottlenecks.
Let’s build something extraordinary.
Contact us to elevate your brand's content.
© Sixteen By NIne Media 2024. All rights reserved.
SBN Media | AI Video Studio & Corporate Film Production – Mumbai, India
Specialized in AI-powered corporate videos, brand films, product ads, and multilingual content
