Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
Why the Developer Experience Is Key to a Successful Data Product Strategy | Big Data LDN | YouTubeToText
YouTube Transcript: Why the Developer Experience Is Key to a Successful Data Product Strategy
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
The core theme is that achieving both speed and robust data management practices in modern data architectures, like data mesh, requires a strong focus on the developer experience, facilitated by a self-service data platform and DataOps principles.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
okay guys um ready to go uh my name is
Paul Rankin um xrush X IBM X Accenture
um I'm now uh freelance actually I do a
lot of uh consultancy for nor more
organizations um today I'm going to talk
a little bit about the developer
experience um especially when it comes
to data products distributed
architecture data mesh if you like a
little bit in there um what I see
working with a number of organizations
today is really two pass patterns yeah
when it comes to this kind of stuff is
is really one you have your very Central
you know monolithic it type of pattern
usually centralized built around a data
warehouse a data Lake you know very
stable highly skilled team of developers
data Engineers um dbas Etc and then you
have the the complete opposite which is
essentially very decentralized
business-led usually right um people
working with uh minimal data management
practices let let's call it like that
but it's fast yep and I think this is
where I see business is really
struggling today is they want fast yep
but they also from an IT perspective
want good data management practices they
want governance they want security you
know they don't want to be restricted
and blocked by those traditional it uh
governance processes so you know how do
we get businesses to be be fast and to
be secure to be governed sticking to
these good data management practices
because it shouldn't be an either or
right I mean businesses can expect to go
fast but also go fast you know well and
governed so you know you probably be
familiar with this scenario and this is
what you you know certainly I'm finding
in a lot of companies I'm working with
now is you know business ask it yeah we
need our data to solve a certain problem
we need it now we need to go fast and it
are kind of asking them well we need to
follow the processes please send your
requirements we'll let it to the backlog
and you know usually what happens is you
know it say well 6 months later we're
ready to start by that time business
have said no no we're we're just going
to go on our own we're going to get our
good friends from Accenture TCS to help
us here and you know hey Presto two
weeks they've uh they've got a team in
and they've gone it alone I mean any
anybody familiar with with this it's
probably yeah quite a familiar process
all right so introducing data mesh or
data products uh domain ownership
whatever you want to call it now this
was this was us uh you know three years
ago um at rash my good friend Omar is in
the audience here um we thought that
data mesh would solve all these problems
for us yeah at rash we were very slow we
had you know very long uh life cycle
very long release cyes Data Business
wanted to go fast it were a little a
little bit slow so we thought bring in
the data mesh guys we're going to uh
work with dams we're going to work with
thought Works everything's going to be
Rosy yeah what I find now today speaking
of other customers they think data is a
product and I get a lot you we just
change our data sets to data products
we'll just rename them everything will
be fine self service data platform we've
got AWS no problem that's covered uh you
know domain ship now what usually
happens and I see is the IT department
split themselves into verticals so they
have you know business-driven verticals
or a business align verticals and this
is what a lot of people see as as domain
ownership nowadays and I think it's
because it's easier to to do this rather
than facing the the difficult part
Federate governance actually speaking to
a lot of businesses they don't actually
know what it means for one and even
though they think they know what it
means a lot of them think it means that
we don't have to follow the govern it
governance anymore we can absolutely go
on our go on our own okay
so you know by companies that want to
follow this new transition to domain
ownership to data products usually what
happens is yes cross functional teams
arrive because that's clearly how they
can deliver data products how they can
deliver these you know this new way of
working and what I see in a lot of
companies is is developers are now you
know on board to these cross functional
teams and the business or the managers
of these teams are saying okay here's
all our tools y please build us these
data products and you know here's our
AWS account here's our get repository
here's our data quality tools etc etc
etc and you know you what do you expect
them to do you know it was okay in when
they were part of a central team a
central it team because then they could
work with each other it was very
centralized High highly skilled
developers but as soon as you move to this
this
distributed distributed architecture
distributed ownership these developers
are kind of brought in and left on their
own a little bit and this is what I find
uh today that people are moving to this
and I really feel sorry for developers
that you know they're having an absolute
nightmare at the moment and you know
just by moving to data product alone and
moving to cross functional teams are not
going to solve your problems and this is
where I find the developer experience is
so important to hear yeah now
introducing the self-service data
platform and and this is key to the
developer experience basically what
we're trying to do right here is we're
trying to abstract a lot of the
complexities away from these developers
we what we want is the developers to
come in have this guided developer
Journey yeah and I'll show you this in a
minute and not have to worry about well
how am I going to connect to my IAP
system how am I going to be able to
secure my data at a granular level how
am I going to meet all these governance
policies that I need to regulatory meet
you know um we as a a data platform team
essentially this is the the IT team
reborn as this enablement team if you
like we need to then be a service
provider to these data product teams we
need to be enable these developers to be
fast but also to build in a secure and a
a governed with good data practice good
data management practices all right and
and this is how I portray it a lot of
you may have seen that have listened to
me before have seen this portray of this
three tier data platform where it's all
focused around these uh these Persona
Journeys yeah and what we're going to
focus on a little bit now today is this
um this developer Journey here so if you
think about the developer Journey I've
tried to I mean this is one of the the
things we we do with a lot of customers
is is portray and visualize this
developer Journey because developers
most developers now they get onboarded
they're asked to build data products and
they don't yeah they may know the
tooling they may know you know
technically how to build it but they
need to understand you know what the
journey is and building a data product
clearly is you know involves a lot more charactertics
charactertics
characteristics and it can be a lot more
complex than just your your data set
your data warehouse that we previously
been building and what I want to talk a
little bit about and show you the
journey as a developer okay from
ingestion through to Transformations
data quality all the way through to
publishing and and operating data
products yeah so if we then look a
little bit about the change in operating
model yeah so what we we started off
with back in the day we had monolith we
had data warehouse teams data teams very
Central very highly skilled government
in a way publishing data sets or or or
data marks if you like or even reports
in some cases but now we move to this
distributed ownership model and we have
two different types of teams one is your
data platform team which we we discussed
about before the job of the data
platform team is really to provide
services to the data product teams and
essentially you know we're talking about
the developer so most of these services
we're talk about today is services for
the developers of the data product teams
we need to provide them out the box in a
governed secure way for them to ingest
data for them to build and manage their
data for them to do their
Transformations do their modeling
publishing to the catalog if you like um
providing the right metadata the right
context the right Access Control
capabilities all of these things we need
to provide as a service and when I go
into a lot of companies too often you
see everyone trying to rebuild exactly
the same stuff they're all trying to
rebuild their ingestion Frameworks
they're all trying to rebuild their
Transformations their data models
they're all trying to connect to ldap
for their access controls etc etc okay
what we need to do yep as organizations
we need to move all this back to a
service we need to offer these
capabilities as a service and Abstract
them away from the developer teams yeah
and that's really what you know we'll
look at now is the concept that I try to
to build in and and and try to manage
for for a lot of companies now is is
more like a a a template project and a
reference project yeah and this is for
me this what dat Ops is all about yeah
is really the experience and the
services that you can offer to
developers to allow developers to go
fast yeah but also in a controlled way
so what you see here and I will show you
this in in real time uh in the next
slide but what you want to offer as a
platform team are the jobs that the
developer needs to build a data product
so then what happens is the developer
only needs to worry about is their
configuration of their own Pipeline and
when I mean that is they need to worry
about clearly the connection to their
sources they need to worry about the
configuration the end points the way
that it needs to flow all the all the
capabilities that that they want to
bring in but they're inheriting you know
the services and when I mean Services
that's one service yep is is ingest data
yep from a source to their main
repository and that can be anyway
through an API through a SQL Server data
base connection scraping from a website
but all of these things are services
that you want your developers and your
data product teams to be able to to
inherit right now what I want to show
you now is I I try to and I'll talk you
through this I try to video yep for a
client that I'm working with just now
you know instead of me explaining to it
I wanted to show you from actually a a
real client and I wanted to show you how
the developer journey is from the start
really all starts with a kind of a a
data the product Discovery workshop and
this here here is a little clip of a
typical Workshop yeah this is usually
more around the definition of data
products the value of data product what
it means roles and responsibilities ways
of working this is usually done in some
kind of this is Meo but you know some
kind of data product design Workshop
then you kind of move from there to more
of the technical modality designer and
again this is a uh you know you can do
this in any way we have built a tool in
some clients that are actually is guided
to the modalities that they're looking
looking for and in this case there's a
big snowflake client uh we use dat ops.
live for this client and essentially you
know we're working out what modality we
we need for this particular use case
this is more of a kind of a traditional
data warehouse type of modality you see
here you know it could be more of a
realtime streaming modality it could be
you know more of a web scraping modality
but it needs to be guided for this
developer Journey yeah so this is just
the next part of the journey and out of that
that
comes the infrastructure provisioning
for this team clearly we've decided this
is a modality we need for this data
product or this use case and now we move
to the infrastructure provisioning this
should be entirely automated from the
information that you get both from the
workshop and both from the modality
session yeah you can see here in this
client that all they need to do is type
in the information that they derived
from the modality in the workshop and
then that whole infrastructure is
automated and provisioned in this case
it's snowflake provisioning environment
it's AWS it's Secrets manager it's the
ldap repository all of that is automated
from that workshop and from that small
piece of information that the the
support team uh provided and you can see
this is building all the infrastructure
maybe it takes a couple of minutes or
fast forward it but it really is
managing the complete in infrastructure
ready for the developers to then work
with so you can see here in this example
all these components completely
automated and that's the next part of
the journey where the developer can then
arrive and and start in a seamless way
we're just checking all these components
and actually what it looks like in in
real time um here so as I said big
snowflake client this uh this customer
um and you can see here it is actually
provisioned uh what we call this this is
a market data example this client is a a
Swiss Energy company and obviously
interested in Market data this is
particularly weather data so you know
coming from uh some weather apis there
and then you can just see here that just
showing you you know the the buckets
that were created in in AWS and secrets
manager etc etc so this is the
environment all built now Along Comes
the Developers for these teams yeah as a
developer I want to create my own uh my
own developer environment I create a
feature Branch as you do
yep here you go this is in also in ds.
live um and they creates a feature
Branch particularly for this developer
and this is what developers love is they
love their own database their own
production database with a copy of
production data that they can do their
own features you know I mean this for me
this is a dream for as a developer um
they they do that you see there when
they've got their own database now all
they need to do is inherit you know jobs
and I'll we'll show show you this this
is a a dtn pipeline dtn is a weather
API um and we'll show you here in in the
in the online developer Studio as a
developer what I can then inherit from
this platform reference project and this
is the really good bit yeah this is what
I want to really show you all of these
jobs they're kind of like service items
if you like I can you know ingest data
from an API to snowflake I can you know
ingest data S3 to Iceberg I can scrape
from an FTP server you know I will
decide what services I want to use from
the reference project this is the the
real killer it makes developers life so
much easier I then decide on this case
I've decided I want to ingest data
directly from an API into snowflake y I
go to the documentation like you just
saw and I see what I need to put in yeah
how I can use this service this is also
really important and from that
documentation okay actually I need to
put in these this information and this
is the information that I need that will
trigger my ingestion so all I really
need to know is the connection details
and what variables to pass to the
pipeline okay and this is what you're
seeing now is these are the variables
that I will pass and from there you know
they will trigger their Pipeline and you
know that's the expectation is that we
don't don't want developers to spend
time having to know how to connect to
data sources yeah we need them to bring
the information bring their connection
details and provide the variables and
for their data to arrive you know and
and and I think that a lot of companies
are not understanding really the the
what data Ops is yep and I speak to a
lot of people and they're saying well
dat Ops is just it's just cicd it's just
good repository you know I've got GS
I've got a runner what do I need for me
this is dat off it's the developer
experience yep it's been able to manage
and configure these Services uh and they
can be anything they don't just have to
be ingestion Frameworks publishing to
catalog you know um data quality
Frameworks uh Source validations all of
these things so basically you know that
gives you a kind of a an idea of of of
what I term as the real developer
experience with with data Ops um and you
know I think in
summary you know we're all trying to
achieve or all organizations are trying
to achieve speeded skill yeah whil
trying to adhere to those you know like
I said those good data management practices
practices
and for me I think it's really important
and it's not easy I'm not saying it's
it's easy but if we can really achieve
this speed and good data management
practices I think this is where we need
to get to and then it yep and and going
back to the beginning I've always asked
a question when I work with with new uh
new clients is is what value does it
bring now to the business because a lot
of a lot of customers I work with it
have kind of lost the value that they
bring to to the party you know a lot of
times it are left with the Legacy
infrastructure you know a lot of the
companies still have data centers um
owned by themselves and you see that
they're essentially looking after the
Legacy infrastructure the hardware and
and they're basically just a c Center at
the end of the day where it now need to
ReDiscover a lot of the times the value
that they're bringing to the party and
to me this is the value yeah we do not
want businesses yeah to really have to
understand all the complex ities of of
their data capabilities and this is you
know really what I I try to bring it
back in to the game if you like yeah and
just before we um we wrap up and we have
time for a few questions if uh if there
is but this one company that I'm I'm
working with Swiss uh company um you can
see where they've moved to and a lot of
these PE uh
a lot of these um these metrics here
people will say that's impossible you'll
never be able to do this you know one
day developer onboarding complete I mean
you saw it there it is completely
automated there is no reason to wait any
longer yeah 30 minutes average time to
ingest new supported source so if
there's a service that says SFTP to S3
for example that's a service that the
data platform is offering you know why
does it take any longer than 30 minutes
for someone to put in their connection
details to the SFTP and their bucket
details as long as it's all connected
the data platform is building that all
they need to do is bring their their
secrets if you like yeah 10 days average
time to publish new data products I mean
again in some cases it's a lot lower and
some more complex data products are a
lot higher but essentially we can
publish a very simple data product in
one day now you know and and that is the
expectation now really yeah
again you know some big savings on on
ETL licenses 1.5 million because in you
know when when I were to rush you know
everybody was doing ETL yep everybody
had their own license for talent for
denodo for alrix for everything and they
were all you know doing their own ETL
why do we need this yeah it's
configuration at the end of the day
we're moving to El and we're just
loading data yeah um you know 50
releases every month compared to a
release every month you know for this
company it was a huge a huge wi you know
people were releasing and releasing just
for for table name changes why do I need
to wait for a month till the next
release just to change a name on my on
my table in Snowflake I mean this is
crazy yeah as long as you have good
versioning for your data products as
long as you have this Control process in
place you know why why do you need to
wait for a month to to the next next
release yeah so this is what I want to
get across is you know for me data Ops
is really speeded scale and good data
management practices combined this can
offer you the best of both
worlds I have got a time for maybe a
couple of questions I've got uh five
minutes left so if anybody um you know
wants to to ask any questions I'm happy
ESS I age you on the journey develop
experience but most of the time when you
kind of move over to at least for few
okay
so so Simon's uh he's asking the
question about the data product teams
and you know they necessarily don't have
the right skills in the particular team
correct is that what you said Simon so
and you know you're you're asking really
how can we accelerate that but how can
we get the skill set up in the in the
teams and and what I'm saying is you
actually don't need need the highly
skilled data Ops engineers and the teams
to do this the highly very highly
skilled data Ops Engineers are providing
the services within the data platform
team to for the data Ops teams to
inherit the data product teams to
inherit and essentially all you need is
your vanilla data Engineers yeah not
necessarily highly skilled like the ones
in the data platform team and that's how
you get that scale at
yeah so so again Simon's saying you may
not have those skills in that even that
in that team and you know that then is
not fit for every every company one
distributed ownership in data mesh is
not fit for every company that's
absolutely for for sure yeah now what
I'm working with a number of clients
they're actually doing something
slightly different yeah they they are
providing the same model the same
responsibility of ownership a data
product team and a platform team but
they're building a data product team
within it that is looking after all
these let's call Source orientated data
products and and that is a I think a way
that would solve uh some of the the
problems maybe you're having Simon this
is not wrong Yep this is not pure data
mesh the way that J uh envisioned it but
it's not wrong it works absolutely works
yeah and not every organization is going
to implement a data mes or a distribute
ownership in the same way that's clear
some organizations are a lot more mature
than than other organizations some have
a bigger budget than other organizations
but you know I think either way this
model using dat OBS you will actually
accelerate your journey for you so
that's me out of time guys but thank you
very much grab me if you if you want to
ask more questions and um if you want to
contact me here's my contact those uh
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.