This webinar introduces Talend as a comprehensive data management platform that addresses the growing challenges of data volume, variety, and velocity by providing solutions for data integration, quality, preparation, and stewardship.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
good morning everybody
and welcome to our webinar today we're
just about to get started now in a
minute or two so please bear with us
okay all right okay well uh
nearly 10 o'clock so with me this
morning uh
my best is karen carrick i'm general
manager with ings
and with me today is who is
state of theater hello everyone good morning
morning
so this morning we're going to speak
about data quality using talent
and then talent is there at the moment
imgs that is
our goal partner for talent in ireland
and has been for over two and a half
years now
and telling our market leader in the
revision of data management and data and
data integration solutions so
we're just going to get going so what is
what we're seeing today is this our team
seems to be
coming up here on this logo on
just so you don't see teams the whole
time and there we go
so the uh we have a data challenge
so in the current market there are
unreal unrelenting expectations
so we're seeing increasing data volumes
increasing data types
increasing sources so the days of just
one um
data set or one type of data set is kind
of gone
i'm increasing use cases people are
trying to do more and more with data
and also more and more people want to be
able to use data
on top of that there is unprecedented
innovation so when i see new cloud platforms
platforms
new types of databases so no longer are
we just talking you know sql relational databases
databases
we're now talking no secret databases
we're talking things like youtube
or you know big data data warehouse and
we're also seeing it try you for
real-time data processing
and also now with the advent of things
like machine learning and artificial intelligence
intelligence
we're seeing on really changes how what
we can do with information
and how we can analyze it how we can fix
it and how we can improve it
and on top of that the biggest cause of
sensor and growth
information is the internet of things so
we're seeing a massive explosion of data
coming through sensors and this is all
driving massive change
so talent are at the front of this
uh change and data management i have
over 1500 customers
worldwide so from people like uh
michelin to
ticketmaster the dominoes the lenovo so
there's great
references and how people are using data
and data analytics
to actually gain a competitive advantage
and improve their processes and
actions improve and improve their bottom line
line
so with talent with one single unified platform
platform
that platform covers cloud and big data integration
integration
so for any kind of integration of data
so talent has supports over 900
different connectors into different
information so we can power the flow of data
data
from one area to another also with talent
talent
it comes with an api in application integration
integration
solution as well so that you can build
your apis
build your web services without needing
to do programming so a very individual
solution there
talent also provides the ability to
master and metadata management
so once you start under understanding
your data then you can try
real value out of your analytics if you
understand you know
where is your customer record
information so where is your asset
record information store
one of the areas we look at today is
data stewardship so being able to
when you find data quality issues being
able to understand to fix those data
quality issues
and provide data back to the business
experts to actually answer
your issues with the data we also
provide this capability to do data
preparation so being able to
provide self-service to an analytical analyst
analyst
to be able to prepare our data before
they run through
the analytical process and all of this
from thailand comes under
a single unified data platform which
um which each piece runs module
were all done through embedded
governance and collaboration
so in essence with the day today's world
data is key
but it's very important that data is
provided to the right people in the
right way
and that you don't get breaches and you
don't get data quality data
governance issues so the key part of all
this integration and all this powering
of data
is to ensure that the right people can
get access to information
talent of course works on uh
all forms of data from on premise to the
cloud and has full support for amazon is zero
zero
and google but also now we're seeing the
support to move towards organizations having
having
a hybrid approach towards the cloud so
possibly they're using
office 365 or for their office application
application
using amazon redshift for storage and
then using
um google's bigquery for actually
querying so
on querying analytics so actually the
combinational hybrid model
is what we're seeing in the industry
going forward
so italian data management platform
allows you to design
through a visual gui to design your
integration i don't mention already
there are 900 pre-built connectors
the talent interface allows you to
generate job or sql so
we run our code natively and again if
this was on a hadoop
we would be running a near spark so this
means it's fast
it also means it's open and you're not
being locked into a proprietary integration
integration
if you do want to take the code the
talent generates and use in a different
way you can earlier stage
one of talent's big strengths is the
ability to collaborate
and provide a shared repository for all integrations
integrations
and to provide automatic documentation
so this means that
if you are building complex data
workflows and you're building them as a team
team
you can all work together under with a single
single
repository you don't have to leave the
application to manage your
you know your workflows one of the areas
on top of the
is the whole area of data quality and
cleansing and profiling so
talent comes with a complete set of
solutions for cleansing your data
so firstly allowing you to profile what
how good or bad the data is
then to cleanse and rich and match and
all of them be able to share that
metadata and again a video will show us
this morning
some of the competitive pieces of talent
as i mentioned before
talent also has this centralized
scheduling monitoring and administration center
center
so that you can publish out the
production version of your interfaces
run test versions one development
version and handle all this all through
a single administration
center and talent has the ability to run
a custom scale
with low balancing and sales paid over
and also with the
up the capabilities of the code
optimization for running a large scale
so again with this explosion of data
talent can handle
not millions of workers but billions of
records and still be able to scale up openly
with talent you can liberate your data
by putting more than 100 times the data
to work
by being able to automate the data life
cycle and operationalize higher volumes
and collective data
i think we've got starlings nesting
outside our office i think you can hear
that in the background
i'm not sure where to come from but
they're here and
the ultimate talent you can discover and
self-service so i've mentioned this already
already
being able to provide the data to the
people who need it in a governed
planned manner so being able to allow
them do their analytics to view the data
but only see the data that they've got
the rights to see and not just to open
up web services or open up
you know dumps of data to excel or csv
that people can then
clone and copy and then maybe end up on
a on a usb disk at some organization or
somewhere it shouldn't be
with alan has mentioned already we can
optimize performance so
it means that you're a fundamentally
lower cost
so with using talent you can do 100
times more which is hot data at the same project
project
i've been able to unlock the data from
the infrastructure so data is managed
independent of infrastructure no lock-in
and continual request
so again no matter what format your data
is then we can get that data out
and then use it and still continue we're
not saying that you have to change your
online infrastructure
we're not coming in saying to use talent
but then take advantage of the current
and all the future
changes and analysis and data initiatives
initiatives
by being able to get that data out of
this title and into where you want to
use it
always we always want our code native so
you can realize new speed and cost structure
structure
but again never required provided
infrastructure and thought build from modernized
modernized
modern id so born for the multi-cloud hybrid
hybrid
internet of things exabyte data structures
structures
so with talent the basic product is
talent studio
which has an intuitive drag-and-drop
environment where you can build your integration
integration
as you can see on the right-hand side of
all the different formats that it can
connect to
and then you just build just like any
etl code you build your workflow from
left to right
take data from the source do something
with it and push it out
and as you can see from the bottom there
we have our code export so when you
create your components
you will see the code that creates
that's also very useful because you can
actually go in if
the talent component that was provided
you can go in and edit that component
in the code and add your own capability
to it and then save it as a reusable performance
performance
and then all that can be handled through
the interface
so when the interface is a job and
business designer it's a graphic assistant
assistant
generates the code and has 909 over 900
available for us
and because calendar is built on an open
source uh technology stack
it's very quick for it to adopt the new
data sets the new data type of data
we you can very easily get new
connectors into new data
at the touch of a button the advantages
of the tab on the studio behind coding
with about 20 percent increase compared
to hand coding
and allows you to build integration
logic tests and increases knowledge reusability
reusability
because of the with talent studio having
it in this versioning and collaboration system
system
you can have better team sharing of work
but we've shared repository work and we
share with all stakeholders
all items are versions and you can get
back to any version of your code easily
so again this increases reusability
which costs customer coding costs
and developer operation costs and it
needs a clear communication across development
development
and then therefore less stupid through
better visibility projects and
jobs so talent administration center has
a web-based
interface where you can configure or
manage users and assign their
revolution privileges and where you can
manage the configuration and deployment
of jobs
it has a monitoring system so you can
track what's going on with eclipse and websites
websites
and provide the centralized view for
messages warnings and hours
and then you're able to build your
schedule monitor your your workflows
and this whole talent platform
architecture because it's all under one hood
hood
you've got the job server where you can
run your job with the execution server
we also use nexus
as the code and metadata repository and
then use get
for the um for the code as well uh
for actual for this code and the nexus
for the generator code
so that means it's all done in one order
you don't have to leave the application
and actually use up uh operation time
with talent data quality again we have a
direct access to data source and
complete and expensive controls in open source
source
so with talent we can access both
structured unstructured and
multi-structured data so again with the
explosion i think with json and xml
we can bring that data in and us against normal
normal
relational data so you can actually view
across and then get to that golden
record which we're going to try and show
this morning
with and then from there we can standardize
standardize
your data and match it up also with the
data we've in
enrichment so we can join up to 900
conductors so
not only takes data from the sources but
move it and then put in the new
into new values and then monitor data as
an ongoing path
a lot of organizations might do a data
quality project as a one-off
and then come back to it months later
again and try and do it again that's not
how we do it with talent
we love to with data quality having an
ongoing process so you're measuring
analyzing and improving data always so
that if
if a certain department or even systems
continuing to create
bad data talent will report upon that
and then you can
actually make sure it's fixed and then
that means when we operationalize that
becomes data quality in motion
so we don't have data quality services
so in the data quality tools inside
talent we can identify quickly data
quality problems
produce statistics pattern correlation
and reconciliation analysis
generate talent jobs directly from the
profiling perspective so if we don't
if we see that there is bad data or data
that's meet that's fairing our data
quality profits and analysis
we can push that data either that it
doesn't get
pushed out and it to a warehouse or is
pushed to another systems v6
or as a video tool this morning that is
pushed to the stewardship
app where you're going to ask the
business to actually comment or fix
or create the bad deal with talent you
can also generate dashboards available
from the data quality portfol
portal and then help collaborate the
profiling process
and all these data quality words can be
shared across your data integration so
once you build these data quality rules
you can use them as part of idea
integration so
you may be doing your data quality
profiling as part of a migration process
so you're going to take a heap of data
and push it into a new system
but the first thing you're going to do
is do data quality analysis on the
source data
and then when that's done you can use
those data quality rules in the
migration scripts
or if you're doing a data warehousing
project where you're bringing data from
number systems to property a data
warehouse or say
business intelligence or analytics again
you can use talent data quality to
analyze the source database
and then use those rules before you
publish them to the data aware
and with tab and data quality it means
that you can then generate high quality information
information
all done using data governance so we're
going to explore
profile them on our data harsh cleanse
and standardize and enrich it
and then match and certify the data so
when a publisher can understand
that this is good data and allows you
map any data source to your business contact
contact
so you know what customer products
assets organization locations
and the benefit is you have more
accurate information so that when you
are running your analytics
you're getting good answers not bad deals
deals
i've mentioned a few times about
self-service and data stewardship and
data preparation
so with data with talent we provide
self-service updated business
those data collaboration tools that data
analysts and it will load
with talent data preparation it provides
and freely access cleanse and enrich
data in minutes not hours
and then with stewardship allows you to
fix manage and certify data
so the subtle difference between
preparation and stewardship is prep data preparation
preparation
doesn't fix or doesn't affect source
data data stewardship provides a
controlled manner
to update sources update or approve
so with data preparation if you want you
can add an analyst
change their data before they use in
their bi or in their reporting
but they're not changing the source so
you're allowing them to do
tidy up in the data before they use it
but doing it in the common manner
which shows up you are providing the
ability to fix
source data so with data prep
it allows self-service data access and
cleansing and integration
and it identifies scale through the data
talent data fabric
allows collaboration again across teams
and daily usage
id governance for their securities again
you're only providing them the data that
they're supposed to see
and allow them to change what they want
to change for their own reports
not changing the source so you can now
turn on
the ad hoc data preparation into fully managed
managed
data definition process when those rules
or recipes are created in talent data preparation
preparation
the id can bring them back and apply
them on the full data set as part of
integration so
when the next data is run into the next
flow of data awareness
that information can use the recipe that
the business analyst was ready to write
and this has all been dated already and
but today i'm actually just on the data
preparation if you are interested in
more than that we do have a previous
webinar where we've introduced data
preparation and it's on our website so
you can you can see that
but today we're going to focus on data
quality and data stewardship so data
storage stewardship
engages everyone for data quality not
just the data shows
it provides a point-and-click approach
for curation and certification
so again if you want somebody to say yes
these records are correct
and can be published to our website you
can use data stewardship for that
it also then allows you to do campaigns
and arbitration and grouping and
emerging of data so again
today the team has done webinars and
providing that one goal in record
so if you know that there's three
records maybe
one record coming from your asset data
system one record coming from your gis
and another coming from another asset
system but you want a single view of the asset
asset
the stewardship app allows you to specify
specify
which columns of which record are going
to be used as the golden record
it uses a thing called survivorship
where you can define
which columns survive the process and
become that gold record
and of course because no matter how good
of machine learning of
ai tools you can use to fix data there
will always be that five or ten percent
that need that human intervention and
data sure should provide that ability
to provide human intervention to correct
data but in an easy to use manner
and again governed you know who makes a change
change
when and you're providing it to the
right people to actually have the information
information
to make the update and with that there's
a full order and track
of all the resolutions so you know who
may who change what when
so now i've talked a lot but i'm going
to hand over to
an idiot who is going to take us through
a demonstration of
for the highlights of the talent tools so
so
as you as you already mentioned a lot of
things but we would like to cover two
things here like data pressure and data quality
quality
so as you mentioned uh that with data
preparation we can with data profiling
sorry we can see where the
where the problem lies what inconsistent
data we have and
uh you know and then basically profile it
it
on a tabular basis on a columnar
at a columnar level or a database level
as well
so the other thing that you mentioned
about data quality is we talk about data
in motion or the data quality motion
so basically what do we mean by that we
don't wanna
one of basically clean up or you know
quality check on the data but it should
be a integrated and continuous process
so that your data
keeps you know your data is up to date and
and
consistent across all your systems and databases
databases
so for the similar line i have we have a
kind of a hypothetical scenario but
which is like very relevant in the industry
industry
so what we have today is a
telecom system so basically which so we
will be showcasing
two databases so if i can show you here
the one is cif
and the other one is crm so the idea is uh
uh
that we know that the system updates
overnight as to
what are the new customers that have
been joining
our telecom service and simultaneously
the system
updates cif database and crm database so
here we are treating crm database as our
golden database or the golden record so
you can see that we have all the customers
customers
with their customer type whether they
are prospect customers
or they are customers or they are
beneficiary and the cis what cif
uh database has is basically the new
customers that have been enrolled
overnight or basically
in last few days and the system has
updated those
uh in your in the cif database
so to start with let's see how
how we can uh you know how we can analyze
analyze
as to what the state of the system as
far as data is concerned
concerned so what we want is that both
our databases are consistent
and should basically reflect the same uh
level of granularity when data when when
it comes to data
so the first thing that i would like to
on the two systems so this here you can
see that this is
my talent uh integration
application basically this is desktop
tool that i have and there are two
perspectives one for integration and one
for profiling and data qualities has
switched to
a data profiling perspective and what i
am doing is i am
running a report which is basically a
profiling report
on two of the databases that we will be
kind of
discussing one is cif and the crm now
the idea is that
the first thing in the morning you come
and someone kind of generates a report
as to okay
if the records have been updated if both
the databases in the system have been
updated are they consistent with each other
other
are they having the same data or is
there any
uh you know discrepancy so i think that
which and if somebody was doing a data
warehouse projects or they could be
taking this data at the end of the week
before they run it
before they pushed to their data
warehouse they could run the report to
see well
so i have run this report so if you can
see that so
basically as you can see what i have
done is
i have basically profiled my columns
which are in cif database and then crm database
database
and to see how many active customers we have
have
so for example like the total 318 customers
customers
are supposed to be active
and that's what it's showing in the cif
database and
the same report i have generated on the
crm database so what crm has basically
like as i already told you that
the crm database basically the golden
record that we have
will have prospect customer of the
beneficiary so if i can show you the
report so what
again it shows that 290 are basically
the customers
and rest are 710 either beneficiary or
prospect customers
so you can we can clearly see that there
is a mismatch so what we basically want
is that both our databases are in sync
so all the customers that are active in
the cif
which is 318 should be reflecting here
so rather than 290 it should be 380.
so we now know that there is a
discrepancy that somehow
system updated the cif database but not
the crm database so what we want is
uh you know that they are in sync so again
again
with the help of data profiling and the
data quality reports it's like really
easy to kind of understand
what's wrong rather than going into
database and checking it manually
and these reports basically we can
generate like every single day or
at whatever time of the day you prefer
based on the system
so we have now seen that there is some discrepancy
discrepancy
so now let's go into the database and
you know
kind of clean it so so
basically now this is i'll go to my
integration perspective again
and what i have done is i have created
two simple jobs
what i want is before we kind of get a
clear database what i have in my cif
customer tables are these
active customers so we'll see some of
the data quality features or the
components with which you can clean your data
data
so that you have a uniform and kind of a consistent
consistent
relational tables going forward so
you can see that what i will be doing
now is to
is basically you know so if you can see
that the these are the customer name so
you can see that there are a lot of
uh column doesn't have a column name
since it's like a online process and it
might be dumping just a csv file into
two-step workflow the first one what it
does is it simply passes the name and
split it on the
space and it then creates three additional
additional
columns and put it into a new table in
the customer's
table in the cif database and the second
one is standardized
it standardized the title so as i have
told you these titles
you know these are not consistent
so let's basically
run it surround these two workflows
so again what i have done is these are
two different workflows but i have kind
of combined them or joined them together
in a single workflow so i'll
basically i run it so again i can
okay so this is the first one which
passes the name
so it's a data quality component what it
does is it basically splits
your fields and then creates another
uh table called custom underscore new
with new
uh three columns additional columns yeah
and the second one with standardized
the standardize the title so what i have
is so this is
again a data quality component so what
i'm doing is i'm finding all the personal
personal
titles which are not as per my
my title list or the basically semantic
library i have with me that okay these
are the valley titles that
we should have it again then it filters
only the invalid rules
and there i have a csv file basically
which has all the valid titles
and then i am kind of replacing all the
invalid titles with the valid words
so as we have so how does it um how does this
this
what happens if there's one that's not
in your in your source list
okay so what i have done is of now even
if it's not in the source list it will
go into the cable
like okay since we need that record it
will go but again what we can do is
someone can manually come and we can
have another flow here which will go to
data stewardship or you know we can push
it into a uh
what about it yeah we're we're gonna
we're gonna tidy up as many as possible yeah
yeah
without a few that might have made human
intervention or something
yeah yeah so i think the job is then so
this is my
customer underscore new table so if you
can see that
what we have now is kind of a consistent
uh titles
and we have column names in proper
format and we have
basically split first form and last time
so this is
a one off a kind of a cleanup that we
have done on this yeah
but then uh based on the use case and
the requirement there are
as you already mentioned with the 900
component so definitely
yeah but even straight away like you
know they have porsche name so if you're
doing a nail shot or something yeah
you know you can address the version as
kiron or didi
but again and that's where you've
actually added quality improved dataset
already and of course
and how do we get the names so we know
there's zip code there knows it's
difficult was that is that using a
standardized because it's done a
semantic analysis on the
on the code that they could write that's
a zip code uh no
it's as of now it's manual like i know
that these columns corresponds to these
names okay that's what i have put it
uh but again like we also have
functionality where if somewhere a zip
code is missing
but we know the address so we can search
using a google address encoder
and we can just fire it up and see what
does this code are but as of now i
knew that okay this column corresponds
to a zip code yeah
yeah why do you think we have and yes
definitely we have so
we have semantic analysis so based on
that it can identify
which column has this kind of data for
example it can identify that these are addresses
addresses
these are email as email id yeah or you
can basically define your own semantic
library based on your reject rules so
basically you can define your own
set of rules yeah and then you can
create your own semantic
right that's our library as well so
moving forward now
since we have we have a cleaner cif uh data
data
base now so
and and now we know that you know
the discrepancy exists so what i am
doing the next thing is
i created this job what it does is
it will identify all those records which
are in the cif database
treating them as active but are but are
not in
but are not in active shaping crm crm database
database
so the idea is that now
now the data discrepancy exists
and someone needs to come and manually
kind of
you know clean it or manually validate
it for example a customer manager
might have the authority to change the
status of a customer or change the
status of a
say subscriber in crm system so definitely
definitely
like not all of the you know it guys
should have access
so the idea is that a customer manager
has that authority to kind of manually
come and change the data
so that we have a we have consistent
so i'll
i'll run this workflow so what it's
doing is again it's identifying all the records
records
which are interested and consistent and
it's putting it
through a to a customer manager which is
a data stored
basically and he can then see on his
front end
as to how many records are
you know inconsistent and need a
validation so if i go to my
data stewardship application now this is
a web based
page that i am accessing and if i
refresh it
so now the idea is that i am a steward
and i am a
validator as well so since we have
pushed those 28 records
so you can see that there were 28
records which were not consistent in
both the systems
and you can see that there are
unassigned 28 tasks
so this is the task that we will be kind
of completing today so verify customer
status so what i'm doing is
i am seeing how many other unassigned tasks
tasks
so these are the 28 tasks and what i'll
do is i'll
quickly assign it to a data stewardship
steward sorry
so here i'll treat myself as a steward
so that i'll be coming and basically validating
validating
the face of these records so i have uh
assigned them
yeah because a lot of organizations
either one
administrator or who's the sign that the
multiple students would be very rare
that you'd have a single
who knows these records the guy in that
so since i have a signed task to myself so
so
i i will now go as a validator here i've
logged in
and you can see that basically
this is the data inconsistency that we
have so basically this customer type
should be a customer rather than a
prospect in crm
because they are active in cif so now
as a customer manager i have the
authority to kind of you know change the
status of that
so what so basically this
ss can already highlighted the
data preparation and data storage so
data storage can't
sorry the data provider can't change the
data yeah but a steward basically assist
what i am changing the data
at the source level itself so
again i have selected this customer type column
column
and again this is a pretty
handy front end where you will get lot
manipulate the data that you have so
what i am doing is
search for prostate and basically just
replace it with customers
that i have checked it the customer
manager has checked that these 28 are
active customers
and they should be basically the
customer type should be a customers
in the crm and again we're gonna just
for the speed of the demo we got involved
involved
but actually it's not probably a steward
would click the ones that he's going
through and do this
replay and check and then in some cases
they know i'm not doing that because it
is a prospect
and i know what the prospect i need to
push that back and say they shouldn't be
a customer if you haven't told me
and again all those tools on the side
we'll see the semantic analysis of the
slide again
figuring out the data but we've also got
things where we can then
you know again and analyze the information
information
that i did if it's good idea to tidy it
up or do whatever we need to do
and of course you've also got the
survivorship ability there as well
where if we two records the same we can
say we'll take the customer name
or the prospect name from this
crm and use that definitely
so i have changed their type and basically
basically
what i'll do now is mark these tasks
ready to be submitted
so you can see that all these are now
logged for validation
so basically what we have done is we
have changed their
customer type so
once when this task is done what i'm
doing is i'm again
running another task now this workflow
what it does is is basically
you know get all the resolved tasks from
the data stored
and it again updates the crm table
against the managed one and also it runs the
the
data quality report so initially we ran it
it
manually from the profiler but we can
kind of configure the same report
in our you know basically
data integration tool so rather than
someone going manually and doing it
we can basically automate it and it will
run it automatically
every time the workflow runs so
hopefully now what should have is
uh so we have cleaned the
crm table and what we expect that both
both our databases are consistent so i'm
just letting this
report run so the report is uses the
basically jasper reported and these are the
the
these are basically just for xml format
and the good thing is basically you can
modify a report as per your needs so
even that functionality is also
yeah based on so i think now the
report is generated now we can see that this
this
these are the 318 from the cif uh
cif data of this and this is the crm
database so again
now it's showing that 318 rather than 290
290
that we had before so now it says that
okay crm and cif are both consistent so
again you don't have to kind of
go and manually check into the database
and querying it yes you can just get a
clear report and
you can get it so again what i have done
is not only that we can have an
evolutionary report
where talent basically automatically
stores your prior
data and the report and again
every time you run it it will give you
based on the trend time trend that
uh you know
so if you can see that i have basically
ran it before
third of may and then we had a long
weekend yeah
this is 7th of may you want to try this
so but anyway but again you can see
here so basically in crm we have three
different you know types of customers
the beneficiary customer and the prospect
prospect
so now it it basically shows that you
can see that there's a time difference
so there were 290 active customers and
now what we have is three
other than 28 hours
so 318 sorry yeah so basically using using
using
data quality reports like this it's kind
of easier and
with the help of stewardship what you
have is a
data quality on motion rather than a
one-off process that every single day or
yes because again like the stereo shifts
could be
executed every week so once you've done
your big data clean up and you get your
data going
the day is why the emotion kicks in that
actually when someone hasn't updated the
crm or has not been the cif
that week they'll be notified in their
area to say
actually you need to update your system
and that way you're never getting to the
stage where he's like a big data quality issue
issue
you're always only a few days out and
then the people who are fixing the data
they will learn because if you keep
getting prompted with this church
you know you're going to get sick of
that and start doing that
and then the uh or separately the
management can actually start running
around saying well actually there's an
area here that we need to
either change the source system because
it's just not allowing the right
capture data or changing the processes
and start telling people to do the data
quality property definitely
definitely so i think i think that that
was a small kind of a demo on as to how
we can put data profiling and
data stewardship together so to have a
slideshow the uh for white island
so thailand is a native solution so
the talent code is not proprietary as
you see it generates native
java and sql code it's open so time is
coming to open source and open standards
so there's no vendor lock-in or blackbox
and you can
leverage existing skill sets so actually
if you have inter existing sql code or
java code
you can actually build that into the
flow line and integrate an enhancer
it's a single platform so you can
consistently reuse and leverage resources
resources
speed documentation and increa enables
uh incremental adoption i think a big
thing today
in a very short webinar like in 40
minutes we've shown you
um the tools for doing some a from good
data quality profiling
also in how to integrate your data sets
and even some of the things we haven't
shown that is capabilities and talent
where you can use things like machine
learning and educating that
in integration flow lines to actually automatically
automatically
fix data so where data can be fixed you
know where we can do data clean up what
we can use
as a video mentioned uses a regular
expression so if
there's a typo in a name or an asset
type to switch it
and then where when all this technology
doesn't actually hit that last five percent
percent
we can then get people involved to
actually fix the information
big area with talent is it's also uh predictable
predictable
so the pricing model is predictable but
with some systems there's a worry that
if you put too much data through it
you might get a smash
or an increased price because you ran
into your pricing no with talent
it's subscription pricing also you pay
per developer
so you're not paying by cpu you're not
paying by if you do stand up
you know a very very big system and you
want to have lots of servers and all
that kind of thing you're not going to
get team live by talent
also then there's no additional cost for
the adapter so you can use all that
should be open source
so and we are we're slightly coming to
an end
just check with tanya there's any other
questions that come in or
um no none of your questions i think everything
everything
that people have we're answering today
yeah okay well i i just like to thank
uh aditya and tanya for her support this
morning i hope you found the webinar
useful um there will be another webinar
with talent next quarter we haven't
decided what would be yet but
uh video we'll be playing with something
and uh
we will show it to you if you do have
any questions and anything you've seen
please contact us and
you can email myself at ccaf iamgs that
i eat
all right thank you for your time and i
hope you have a good day thank you thank
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.