Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
FinOps Transformation: Unleashing Comprehension and Agile Data Models | FinOps Foundation | YouTubeToText
YouTube Transcript: FinOps Transformation: Unleashing Comprehension and Agile Data Models
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
Salesforce transformed its FinOps capabilities by re-architecting its data models and infrastructure to be more agile and comprehensive, moving from manual, spreadsheet-based reporting to a unified, data-driven approach that empowers innovation and builds trust.
thanks everybody for joining us this is
uh finops transformation how Salesforce
Unleashed comprehensive and agile data
models over the last year my name is
George Parker I'm the director of finops
product management at Salesforce for
internal products I also oversee
capacity product management as
well I'm danush deaj I'm the lead fops
engineer for sales
force and also a special shout out to
will Forester who's probably going to
watch us on YouTube at some point he's
our director of engineering our close
partner in crime and he helped us put a
lot of this content together today
a little bit of background before we
talk about the details here Salesforce
has been running in the cloud for a long
time if you know Salesforce we pretty
much started in the cloud it was Private
Cloud 25 years ago um but first
announced in 2020 we made it a pivot to
public Cloud uh we call it hyperforce uh
it's a cloud native architecture that
empowered all of our customer agility
customers that have been running for
decades and on- premise data centers
were able to Port their custom
applications and integration to public
cloud and I think as everybody here
knows if you pivot to public Cloud
you're going to need to change your
finops tooling so that's what we've been
doing for the last few years denu I
specifically are from the cloud
economics and capacity management team
we provide data analysis and forecasting
capabilities for Salesforce
infrastructure we're one of one of many
teams focus on availability and
infrastructure engineering within the
broader Salesforce engineering
ecosystem one of the consistent themes
that we found about our pH opset scale
is that it is a big data engineering
problem at the end of the day this is
because of our scale but for the diard
engineers you're probably going to
notice some standard engine engineering
principles that we required in order to
scale throughout this
presentation as we tried to address our
finops concerns we applied a lot of
things that you're going to be doing in
your standard engineering teams already
so this shouldn't be novel and new in
some ways for the engineers but how we
apply them to the finops problems is
what's unique here in the finops context
our broad team includes both finops
practitioners and cost engineering that
builds supporting tooling for showback
cost allocations budgeting and a whole
bunch more in the following present
we're going to Deep dive into some of
the specific methodologies that we use
to build our data Lake how we evolved
our data models and how we approach
quality as a topic we're also not the
only team at Salesforce that cares about
fops we have some great Finance Partners
here they're actually going to go listen
to Apple right now we also have a
partner sourcing team who's sitting in
the second row here uh we couldn't do it
without them uh our colleague I just met
rahil he's from slack part of the
Salesforce ecosystem as well there's a
lot of people at Salesforce that care
about finops and I think if you've been
listening to the Keynotes you know that
it takes the whole Community the whole
company to think about finops as a
problem and I think Salesforce is a good
example of that we also have some
service and budget owners from across
the business that represent individual
service teams at Salesforce there's
largely two major orbits of finops
personas our team's a little bit in the
middle as is sourcing and finance we
have the folks on one side that drive
corporate strategy this is finops
finance exec and procurement and for
this team uh for this group our team
holds a monthly budget review and track
tracks weekly kpis to ensure that we're
on track for our budgets on track for
our strategic objectives and to be
successful with this audience we need to
solve for High Fidelity and extremely
rigorous data quality while accelerating
the teams through simpl to use executive
dashboarding uh and then there are folks
that are actually driving the change
which is engineering product
sustainability and service owners for
these audience we focus on building
self-service based tooling so to be
successful here we focus on having the
tooling to be flexible and then
providing deep in insights that cater to
services uh yeah so today we'll share
how we uh applied agile principles to
our finops architecture which ultimately
transformed our team from falling behind
to innovating over the last two years so
back in 2022 a lot of the reporting done
to the executives were all in U Google
Sheets and uh it required a lot of
manual time to build service owners
weren't happy with the tools that
existed because it wasn't really meeting
to the needs and all the work uh that we
were doing wasn't uh having any customer
impact so at the beginning of 2023 we
went back to the drawing board and
proposed a complete Green Field re
architecture uh so to make Salesforce
phop successful we focused on re
architecting with a few key agile
principles in mind one reacting quickly
to the changing requirements uh
continuously reflecting uh continuously
reflecting on the progress that we have
been doing uh continuously collaborating
with our customers to drive product
engagement and scale and and uh
consistently improving through
incremental Investments so with all of
these agile principles within 6 months
uh we had decommission the previous
products and rebuild the products ground
up uh during the process we also cleared
off majority of the technical
debt and now we began to innovate and we
delivered numerous new phobs
capabilities that were requested over
the previous years our users and
customers were happy and we had managed
to turn around the internal customer
sentiment the business needs still
continue to change but now we are able
to deliver with agility all while
upholding that customer trust so
throughout this talk we'll be sharing
some of the key learnings from the
transformation there were three major
factors that were critical to our
overall success and we're going to
structure this presentation around those
first was leveraging a unified data Lake
that enabled us to react quickly to
change second was designing for
flexibility to sub support the actual
business needs and finally continuously
impacting and improving and inspecting
our data quality
quity with that I'm going to start with
the UniFi data Lake because you can't
develop a whole lot of finops maturity
without a lot of data and at scale again
I want to reinforce this is a big data
problem that requires Big Data
Solutions we'll pause here because I
know people like to take pictures this
is just an architecture slide I'm going
to pause on it so that you can take that
shot we're going to walk through this
slide in a little bit more detail
throughout the deck highlighting a few
of these blocks as we talk about them
our first bottleneck to the team's
throughput was that our data was in
disperate sources prior to 2023 we were
accessing and reading from a lot of
different data sources most of your
fop's capabilities require billing data
to be joined with other data sets things
like infrastructure metrics CPU
utilization logs to show customer usage
that you might need for unit economics
and metadata like team ownership
mappings that help you with your
governance plans every time we needed to
onboard a new data set back in 2022
anytime we had a new user request that
act needed required us to access new
data it would take more than a week to
develop deploy and ingest a new metric
and this created a huge tax on basically
every feature request that we had
anything to do anything to do with
expanding our existing phop capability
was at least a week just for the
overhead to get the data so beyond new
ingestion this slowed our resolution if
we had operational impacts if we had
performance or availability challenges
and all that contributed to user
frustration with our tool
chain so to overcome this issue we made
a strategic ision to move from working
and operating on a custom on-prem
cloud-based platform sorry custom
on-prem platform to migrate to a
Salesforce internal cloud-based uh
unified data Lake where the metric locks
and metadata were already hosted so we
had to onboard the cloud building data
to the platform which we did so by
configuring Cloud native replication
Cloud native replication provides a low
maintenance seamless way to inj spilling
data to the data lake so by collocating
all of these key data sets needed to
build the fops capability uh we could
now join the Cloud building data to
either metric locks or metadata by
simply switching table references uh we
also removed the need and the
operational overhead of maintaining the
previous inje pipelines needed so by
unifying all of these key data sets in
the same data L you can bring
significant agility to your team do has
this helped reduce the data injection
effort of about needing one week effort
from the team to less than uh to less
than one hour so this helped us develop
3x new phobs capabilities uh in an year
with exactly the same team size so we
were able to add a new budgeting
structure a self-service based unit
economics and a workload optimization
dashboard so we also improved the shared
cost allocation coverage from 70 to 95
95% which represents majority of the
kubernetes cost and that's that's also
one of the biggest pend
drivers so once we were on the unified
data Lake our next Focus was on to
making the data models
flexible thanks that's right and I think
if you look back at the lesson about
unified data Lake if you're running your
systems out of multiple different data
sources think about how you might unify
them once you unify them you start need
need to start thinking about the
flexibility of the data model you're
putting them in as you start defining
your unified data Lake you're going to
be setting schemas that are going to
influence how you access and read data
in the future I'll give you an example
from back in 2022 and early 2023 when I
switched into the finops team last year
and I have a history at other companies
of doing fops but it was in a different role
role
in the beginning of 2023 we had really
the opposite of a flexible data model
and I really would call it death by
spreadsheet we have Salesforce tens of
thousands of accounts so just tracking
the new accounts created every month
meant that it took one person about
three days every month just to add the
new account mappings to our data sources
so that way we could do financial
reporting on them it was way too much
the pace of change was literally faster
than we could keep up with new accounts
would just be missed in the monthly
reporting cycles and have to catch up
next month because they just weren't
there there we simplified what that
looked like on this slide just to show a
few pieces of the data but at the time
we maintained all of this by hand in a
spreadsheet that had about 200,000 cells
it was about 20 columns and about 12,000
plus rows each row was uniquely indexed
by the account number and it was
consistently growing by hundreds of rows
every month in this kind of data source
there were actually only a few pieces of
interesting data and they're shown in
the table on the left here for us it was
something that we call a domain we had
the product and the executive to go
along with it it was the exec that
footed the bill the product that it was
under and then it was all again uniquely
indexed by that account much of the rest
of the data in the other 16 or so
columns was just duplicate copy and
paste over and over again that in turn
made it impossible to tell the
difference between an exception and an
error did somebody accidentally drag a
row down and that's why it's different
for this one account or is that because
this account really is special for any
new entry it was obviously slow to be
updated and it made it very difficult to
extract or gross the budget
relationships between these 12 15,000
different accounts and tell which
Services belong to which Executives
without just going through line by line
so think about if you have a data source
like this that you could maybe ascribe
to the death by spreadsheet think about
how you might remove this anti-
pattern instead what we do now the last
year we we modeled this structure as a
set of key
relationships we distilled it down to
two key relationships it's the domain
that maps to the product and the product
that Maps to the executive this
eliminated all of that duplication of
data we actually don't even need that
account ID anymore which used to be our unique
unique
field that made us generate all of those
rows we don't need it anymore because
the accounts are all tagged with that
domain already we can now roll that up
to product and executive it's much
simpler this is higher quality any
exceptions are clearly called out in a
separate set of logic so we ensure that
there are no errors in this it's faster
and easier new entries can be added now
on a self-service basis by the finops
teams and all the downstream tooling
reflects the change within about 12
hours in a lot of cases when a new
account is created we don't even need to
make any edits because we're no longer
looking at the unique account ID if the
tags are correct it's automatically
processed and it shows up in the billing
the next day and it's a lot more
explainable the whole thing is about 200
simple lines of configuration that we
frequently share with budget owners and
Executives that want to understand what
they they're being charged with I'll
literally get a slack message that says
hey this account showed up on my list
the other day I don't know where this
came from and I can just respond oh
here's the mapping for the whole entire
fleet it'll take you five minutes to
read just go skim that and I usually get
responses like wow I was always really
looking for a way to see the overview
like that thank you this transition to a
config driven or what Salesforce
oftentimes will call a clicks not code
approach means that the finops
practitioners are able to selfservice a
whole lot more than they were in the
past last year alone our finops team
made over 500 plus configuration updates
in G itself updating the budget
reporting on their own we manage all of
this currently as yaml on git some an
approach that some of you might already
be taking as well I like the git
workflow I think we we all do it enables
audits CI workflows and tracking and
that's really helpful for us we get peer
reviews when these changes are made we
have a reference to who made the change
in why um if you can replace some of
your tabular anti pattern some of that
death by spreadsheet with this kind of
relationship structure it it is simpler
it is faster and it is more accurate and
it's even better when it's self-service
and accessible to all your non-technical users
that account Works example works great
for account level internal building but
what about resource level
tagging some of you might be guilty of
this my last company was insane tagging
requirements right like 12 tags per
resource label it with every known piece
of metadata you could possibly need um
that's that's the way my last employer
did it and I I'm happy to say we don't
do that here at Salesforce the problem
with that approach is severalfold it is
relatively impossible to add or update
tags retroactively you're going to have
human error uh we never really at my
last company could get finops coverage
past 95% mostly due to that human error
component to solve this at Salesforce we
really only care care about one tag now
the one tag that we care about is
enforced on our corporate objectives
it's reported on globally and it's
enforced in CI pipelines we add Opa
policies when gaps emerge and we make
sure that everybody is using this
service tag that tag needs to be very
granular it ties back to the individual
microser Services being deployed pretty
much everything else is going to be
derived and it's shown here in yellow
from a metadata API that we call all of
the other references
from using the service tag and then
referencing it against the metadata API
we can map it back to teams we can map
it back to managers we can map it back
to engineering orgs and thanks to our
unified data Lake we can further join it
to that account data so we know all the
details from that interface adding in
context about environments budgets and
products and as a result we've improved
our reporting coverage from 9 5% to
about 99% .9% we can also now backfill
which is hugely critical for us
providing the capability to
retroactively associate new metadata
with existing
resources this gives us tremendous
flexibility to make a change in the
middle of the fiscal year while still
offering consistent fiscal year
reporting without weird spikes or jump
Cuts in our data and we can even do AB
testing which we use to develop next
year's framework while still reporting
so uh a challenge we ran into When
developing new phobs cap new phobs
capabilities at Salesforce was the share
number of services that existed at
Salesforce so there are about thousand
plus services and a central phobs team
trying to build a new phobs capability
for every service which has varying
implementation needs and methodology
wasn't a scalable approach so what we
focused on building is externalizing the
implementation details and providing
generic processing capabilities uh so
one example uh I would like to talk
about is is uh unit economics where this
was really helpful at Salesforce we care
a lot about unit economics whether it's
customer usage due to internal or
external demand it can drive material
cost wi to Services now service owners
can't control the demand can't control
the demand but what they can focus on is
whether they're getting consistently
efficient with the varying scale needs
and unit economics is one such metric
that helps with it now to onboard a unit
economics metric a service owner needs
to write down a yaml based configuration
yaml provides a simple interface even
for non technical uh users to write down
configurations so the yaml configuration
is then submitted to the onboarding
interface in our case this is git as
Jorge just said uh it provides like easy
Version Control and audit tracking so
and then there are daily pipelines which
inj just all the configurations from the
git process and publish the data to the
live Tableau dashboards now customers
can look up their uh newly processed
data on the dashboards and they can
track metrics even time over time based
on the backfill needs now this has
enabled onboarding service owners to
either a unit economic capability or a
cost allocation framework and publish
data to a live dashboards in less than a
day uh with almost zero engineering
effort to our team now diving a little
more into the onboarding interface and
the configuration itself so we capture
as much key metadata as possible as part
of the interface some of there are about
four different areas one data governance
that is whether uh who owns the data and
what is the classification of the data
set whether it's uh internal or
restricted and publicly allowed so on
and then a retention that is how long
users want to show the data it could be
number of days whether it's 6 months 12
months 13 months so on and then
documentation so documentation field
highlights the key input and output
metric descriptions so you can see based
on this information how we can generate
an automated data catalog from all the
configurations now and then there's the
fourth part that is the uh key part uh
which is the metric calculation itself
in our case this is through a SQL query
the SQL query allows the parameters to
be filled in during runtime and the
pipelines execute those equl queries an
example query is joining yesterday's
infrastructure cost with uh yesterday's
infrastructure customer usage to uh
calculate the unit
metric previously our primary data model
consisted of a single cost output that
we called total cost this was our
approach in 2022 it represented the
budget impact of a given service
inclusive of discounts fees and credits
we ran into limitations with this
approach as it created a monolithic cost
number that lacked any detail needed for further
further
analysis just one example for us of how
this penalized our efforts as we began
to receive a large amount of one-time
credits on random accounts it created
anomalies in our budget run rates for
all those Services exec wanted all the
one-time anomalies to be hidden they
wanted to see consistent predictable
numbers they didn't want the jump Cuts
in the data so they wanted us to exclude
all those credits from the total cost to
stabilize the trend unfortunately the
limits of that older less agile data
approach limited us in our ability to
execute that change quickly so excluding
credits meant that we needed to create
separate fields and migrate every stage
of our processing pipelines to support
these new Fields then backfill all of
that data and it took us over a month to
do that all while the engineering
leaders were fundamentally confused or
misled about what their budget impacts
were so to overcome this pain uh we
redesigned our data models uh for nend
to agility with the goal of being able
to react to those changing requirements
which George quoted an example with a
goal of reacting to them in less than
one week so to achieve this target we
introduced a field called charts types
so charts types are nothing but the cost
categories which previously formed the
composition of total cost now instead of
storing a single output called total
cost we store cost by all of these array
of chge types which is useful uh to
Salesforce some of them some of the
examples are the usage charges
commitment plan discounts credit credits
and so on now we Define total CH total
cost as a combination of these charts
types you can see how we could include
or exclude credits in less than a day
and publish to our customer facing
dashboards based on the business
requirement so this also prevents us uh
having schema migrations when we are
adding new charts types because now we
are adding them as rows instead of as
columns so this prevents a lot of effort
needed to like migrate all of our ETL
steps uh so beyond the data model uh so
beyond just the agility uh the data
modeling helped us provide deeper
insights uh which are previously not
possible an example use case is for
examp an an engineer wanting to track a
specific architectural optimization
without the added noise of discounts and
credits could do so by just looking up
the usage charge and the by just looking
up the usage and the cost associated
with it so they could just focus on one
metric and then see how their
optimization was
going so once we had the unified data
Lake and then uh we designed our data
models for flexibility our next Focus
was on how we can continuously improve
the data quality that is the data that
we were publishing to our
customers the last major piece of this
presentation is around data quality data
is the foundation of finops and if that
data is not accurate or not available
the business will run the risk of making
decisions on incorrect data and it will
break trust with your users they won't
act on the insights that the finops team
is providing and it'll put your
organizational goals at risk we want to
encourage you to build your brand around
data competency and maintaining trust in
the data you should continuously
prioritize inspecting and improving that data
data
quality I'm going to highlight a couple
things external artifacts that you might
want to reference there's a really good
white paper on the phop site it's called
building and maintaining healthy working
relationships for finops practitioners
and it talks about the Deep importance
of trust for finops teams and we
couldn't agree
more another reference in the marketing
world there's a concept called brand personality
personality
I'm not a big marketing person but back
in the 90s this like marketing industry
Legend guy David akre wrote a framework
that said All Brands basically subscribe
to one of a few specific personalities
they're listed here sincerity excitement
competence sophistication and
ruggedness we use this lens to talk with
our team and reinforce the importance of
what we call data competence we're not
trying to build the most rugged fops
team we're not trying to build the most
sophisticated or sincere team but we
want to be really competent at our data
we've talked about the reasons why
incorrect data creates risk it hurts
trust creates operational risk around
making bad decisions it creates trust
issues that prevent cultural adoption
and it compromises your accountability
structures and governance at your
company to build your brand around data
competency and maintaining trust in the
data that's the importance of
prioritizing The Continuous inspection
and adaptation of your data quality
commitments so to achieve data quality
first we created a framework that allows
our pipelines to run automated data
quality checks so the uh newly processed
data is published to the customers only
if all of the data quality checks pass
uh if any of them fail there is an
automated alert sent to our engineer on
call for immediate tring and resolution
so once we had the framework in place we
uh brainstormed and added data quality
rules throughout our software
development life cycle so before
releasing a product uh we incorporate
data quality as a design review
component uh as part of the pr reviews
we make sure that every change that is
being pushed has the necessary data
quality check associated with it and in
case of outages and root cause analysis
investigations Fe brainstorm how a
future data quality check can prevent
such scenario so data so overall the
data quality framework and the rules
have helped us prevent uh issues in
numerous scenarios and help maintain
customer trust a notable one from a
recent time is when a few million dollar
discrepancy in a specific chart type
calculation was automatically caught and
the team was able to trag and resolve and
and
this was due to a billing column
definition change the incident was
prevented from being pushed to the
customer so once we had the data quality
framework we have a way to measure and
the report on the quality of the data uh
it is now it was now important to call
our shots and follow through to us this
means setting up slos and making sure
making sure that we follow follow the
slos by tracking them and optimizing for
the slos uh we track our slos across
three key metrics one availability we
want all of our data to land in less
than 24 hours uh and then accuracy uh
between two source of datas that we are
comparing we want them to match up to
99.9% and for latency we want the
dashboards to load in less than 5
seconds slos in our case is service
level objectives and we distinguish that
from service level agreements these are
our Targets this is what we measure this
is what we manage we meet regularly to
look at those service level objectives
and review the key business outcomes one
of the things that is really important
for us to look at is our support ticket
load how many incidents occurred how
many users were impacted what are they
asking for are they asking for
enablement or they asking about issues
or feature enhancements we also look
heavily at monthly active users which
reflects how people are engaging with
our products we currently have about
1,00 plus internal monthly active users
using our finops
tools Beyond just measuring and tracking
these numbers it's also important to
action on the improvements that you
identify this could mean things like
prioritizing actions from a recent root cause
cause
analysis it could also mean improving
developer productivity having a
conversation between engineering and
product to say we really need to do this
so that next release we can go
faster and another piece of prioritizing
engineering changes is actually just
communicating the impact to our users
and following up on the RCA that builds
the trust that our team is being
diligent and competent with our data by
creating this positive feedback loop we
improve the availability of our tools
from about 81% to 98% within the last
year we believe this brand of data
competency helped Drive the finops
cultural adoption leading to the monthly
active user improvements on our Flagship
users last piece of that strategy for
quality is also our UI with our old
architecture it was a custom web app we
had a very limited basic Explorer
capability it was relatively expensive
and slow to add new features and
functionality by switching to a bi tool
we had tremendous velocity gains by
switching being able to just drag and
drop our data was itself a huge Boon but
we also gained a whole host of other
features that we would have had to build
ourselves things like alerting and
subscriptions I can use those to track
missing data sharing capabilities that
help me quickly troubleshoot a
customer's issue because they can send
me a URL directly to the view that
they're looking at and I can do the
same while Engineers are troubleshooting
production issues the engineering
manager or product manager can actually
go in and add a data warning to the UI
so any new customer who looks at that
dashboard is alerted that the data here
may be faulty and we're
investigating this approach works well
for us we're a big fan of Tableau some
of you might be using other tools that's
tottally cool we just want to encourage
you to fall in love with your bi tool
and see if you can get velocity gains
out of
it a lot of you might also use your
Cloud providers native Explorer
capability that's totally cool I know I
was happy with it at a previous company
but if you do have access to that bi
tool encourage you to take a look we use
it to build other capabilities Beyond
just exploration which is what we
offered when we were a custom web app
because of the speed gains that we got
from developing on a bi tool we now have
self-service unit cost dashboards we
have budget reporting and there's a
whole lot of other stuff that was being
done in spreadsheets and slide Decks
that can now
move our Flagship product not shown here
is our internal hyperforce cost Explorer
I told you about hyperforce at the
beginning of this we have a cost
Explorer for that again about 750 users
every month we have an infrastructure
budget dashboard that's used by all the
engineering EVPs and their budget and
operations teams to track Cloud cost
performance we also now provide a
dashboard where internal customers can
go look up all of their onboarded unit
economics with the ability to slice and
dice the metrics by some of the key
Dimensions they don't have to work on
building a UI or any of the reporting
all they really have to do is set up
that yaml file the news showed us set
their query tell us what their unit
metric is we already have all that cost
data it's very fast so about six service
teams onboarded in the last two months
providing service teams the ability to
build their own unit economics and just
leverage the framework that we built to
do it faster we also keep because of the
power of our bi tool a staging version
of all of our dashboards if we're making
changes it's very easy for us to not
have to go stand up another custom web
app but simply to have another copy of
our dashboard where we play with the
production and staging data
differently we're really excited to
continue the journey this year with the
methods outlined here we accomplish
lished a whole bunch of objective
quantifiable kpi things I think the real
value has been the trust that we built
some of the quality that we've built the
way it's changed our organization seeing
that many more people accessing and
loving these tools I've had people from
our company come up to me during this
conference and say hey I heard about
that thing that you guys built like
that's super awesome I'm starting to use
it I might build a you know version of
it myself so that kind of traction
doesn't spit on a kpi slide neatly but
that's really the power of what we did
in the last year and I hope that some of
the learnings that we had some of the
methodologies we've taken might help
influence how you're building your tools
at your company it'll free up space for
more Innovation more features and
ultimately more user satisfaction and
finops adoption we'll close before we go
to Q&A um again excited to continue the
journey we're building new product
features we're adding new levels of
budget hierarchy going for more granular
down into the org um shout out to Julia
Harvey and ursul from Nationwide if
anybody knows them they did a great
presentation last year at phop X on the
coin score we totally copied that
plagiarism is the sincerest form of
flattery it's providing us more savings
opportunities internally we all have all
eyes on Focus I'm personally a member of
the focus working group and actively
participating there we're super excited
how that can automate our ETL processing
for other vendors we're implementing new
business process changes adding new
Financial metrics improving our margin
analysis improving our automation we're
looking at layering in some AI use cases
for us that means right now building
kind of some SQL capability ities we
have a lot of people who ask to be able
to query the data we can very quickly
use gen to help us with that um and
we're just excited to come back to
finops X for all the reasons that all of
you already know this kind of community
this kind of sharing is really why we're
here we'll pause here and open it up for
time thank you both and sticking with
the theme of the conference I will not
be running to you with the microphone
but I will lightly jog uh hands up in
the air got one right here my light
jog it sounds like you have uh internal
customers who are reading directly from
the data Lake put building their own
reports building their own dashboards
whatever it is that their specific needs
are how do you handle the changing data
schema with
a potentially a large number of internal
customers who may be impacted when say
all of a sudden you your schema aligns
to focus for
instance it's a good question and by the
way that's Carl from Walmart um he's an
active member of the focus group too
shout out to Carl brilliant mind um we
mostly have been able to keep it
relatively stable we do we have added a
few other tables over the last year um
we're having a discussion now as we
start working through multiple layers of
cost allocation it's not just
Distributing our kubernetes spend it's
Distributing the tenant charges on
kubernetes so now we're having to peel
the onion with allocations that may
drive some new columns we were talking
about that on Monday at the office
office um we've mostly been able to kind
of follow an evolutionary schema
approach where we're just adding a
column we're not really having to hard
cut people over yet um it's one of the
reasons why I think going back to the
focus discussions we've had around
versioning the API we want to be really
really thorough if we're going to make
any kind of breaking changes I think all
of the major Cloud providers are pretty
conscious of that stuff too so we just
don't really want to go through and like
rip a column out all of a sudden we've
talked about like adding like a new
column and just calling it like beta
column name instead of you know the the
current column name so we've been
working through that I think it's kind
of a case by case but mostly I don't
think we've had any real breaking
changes with our tables we've just added some
some
more yeah pretty much the same uh we
have tried to make it Backward
Compatible so far and we have been
successful some of the other ways like
we we are thinking to handle them is
like having V1 and V2 schemas where like
customers will have to migrate over to a
new schema although sometimes we try to
abstract through views instead of like
directly exposing the table so we can
hide some of the intricacies for our
customers uh but but but yeah we haven't
run into a real uh breaking chain scenario
yet I question on the pricing showback
like if we have teams trying to do uh
forecast and then the showback logic are
you guys using like actual prices from
like an OS where it's like on demand and
then we'd buy a bunch of flex cuds
because their resource cuds expired so
they didn't do any quantity changes but
they saw a big dollar difference and now
it's throwing us off so how do you guys
solve for actual or like a rolling
90-day Blended average like how how are
you guys attacking that that's a good
question and that's Brent anybody else
who has a question encourage you to say
your name and what company you're with
just I'd love to meet the community here
I had dinner with Brent last night it
was nice to sit across the table from
him um forecasting is a little bit
different topic than we were covering
today it's a really good question though
we're still working through some of the
details on forecasting we're right now
in more of a naive Trend based
forecasting and working to layer the
driver based stuff on there most of what
we're doing today is either with the the
raw usage cost or with the total kind of
aggregated cost um there's a huge value
in being able to separate out uh and see
what like kind of the engineering impact
is and forecasting with that kind of raw
usage cost but at the end of the day it
is going to come down to what's the net
net um we do separate credits out which
helps tremendously we've kind of pulled
that out and so to a large extent we're
consistent with our you know pricing
agreements and everything like that but
uh without going into too much more
detail about all the nuances of of
pricing rates and stuff like that I
think that's generally we we look at it
from a total cost perspective absent or
I have two questions uh number one is
you talked about I mean to say you have rearchitecturing
at the level of the deployments SP for
example or in kubernetes for EBS storage
there's persistent volume claims so
either claims or persistent volume sells
are taged so we are able to retrieve
those and then uh when we allocate we
are able to provide a s uh unified view
of a service cost whether a service is
running within a pod or it's it runs in
its own instances so uh we we are able
to unify that but uh tagging has to go
beyond just the resource itself like
sometimes like the Pod and anything
that's Deployable just a followup does
it take any CPU or memory metric because
at pod level but then it comes back to
the container yeah so uh we use the Pod
level requests and uh use use CPU course
both to determine the cost show back uh
again it's it's it's a broader
discussion but we use couple of metrics
to determine what are the costs of each
Services running within a CU kubernetes
cluster I encourage you to stick around
and ask D 10 more questions about that
he can go deeper on this topic so if you
want to know more he's the guy um I'll
also add that we have our own metadata
API that helps us so you mentioned
service owner and we I would distinguish
that from service name we actually tag
with a service name sometimes it's in
the Pod label sometimes it's in a
resource tag but we use that service
name as kind of that unique index and
then we use a meditated API to tie that
to specific service owners teams and we
actually can roll that up our work chart
and say okay that goes to that manager
which goes to that VP which goes to that
product Etc so we have another API that
we also leverage here and another
question I have is on that charge type
that you introduced how do you show the amortized
amortized
cost uh can you give an example like I
can take that one um to a large extent
we're not looking at the amortise
charges um we've kind of mostly looked
at we mostly look at that from the
monthly bill perspective so we're not
really getting too much into the
amortised piece we're mostly looking at
real real Bill impact we can ex extract
that out and if we want to look at real
Bill impact and we want to look at
engineering impact that's where we look
at the the raw usage cost we're mostly
not doing the the SI piece right
come I'm curious I enjoyed the
presentation it was awesome um actually really
really
was I know him that's why I'm I'm not
just like raing
him I am curious though how do you de
with shared costs uh and and give
visibility into that for uh owners
that's a good question Ben thank you I
appreciate that Ben Hartwell from Splunk
everybody great person to talk to as
well um we are in the middle of
implementing a selfservice allocation
interface we're probably I would say
about on like the fourth or fifth
allocation and we're getting to the
point where it's largely self-service
the first big one that we had to solve
was kubernetes and I'm actually super
excited about the fision or scad feature
from AWS um they're actually starting to
be able to do allocations scad stands
for shared cost allocation data it's
breaking apart your eks charges in the
bill itself they already support this
for EC s it's coming out now for eks
there's a PM here from AWS you can talk
more about that so that's probably the
first one if you're thinking about
allocations at your company you're
probably thinking about kubernetes first
we did um so we solved the kubernetes
piece on our own we may switch to AWS a
solution on this because I like taking
stuff from them instead of having to
build maintain it myself um as we got
through that one there was other obvious
cases where we had shared costs that we
needed to break up some examples are our
vpcs those are shared network resources
um our odcs are actually shared across
teams we need to solve for that um we
have other networking Services I won't
get into the details of but we have a
bunch of networking and shared resources
across the board where tagging with the
individual service name doesn't really
make sense because there's a lot of
services all sharing one resource we
first took the approach of working
directly with the team doing all the
number crunching and again this is your
guy so you want to talk deeper talk to
den uh we look at like the system
utilization and take that cost and
divide it by the system utilization and
start attributing that out to the
different service owners that are
sharing that resource there's a bunch of
different ways to do that you look at it
from a flat perspective you can divide
it by system resource utilization you
can come up with weighted averages based
on p&l like there's a bunch of ways to
do that I be careful here because
especially when you get into chargeback
we're really focused on the showback of
this right now our budget's done a
little bit differently but we so we do a
lot of this in the context of showback
not chargeback when you get into
chargeback you're actually creating some
serious business incentives for
different pieces of your org if you
start I was talking to udum about this
earlier and like if you start taking you
know a single large monolith existing
stable mature product
sharing Resources with some startup
brand new projects and then try to do
like a flat bill across those you could
put those smaller products out of
business essentially because the
overweight of the cost on them is just
too much too burdensome and meanwhile if
it's a shared service that you're trying
to drive adoption internally like let's
say you're trying to use Vault and all
of a sudden there's this like $100,000
bill that that service is going to get
for enabling Vault they're going to go
wait a second maybe I want to build a
smaller solution and that actually
defeats your corporate objective so you
have to be careful when you get into
char back for this again we've
implemented this a couple times over and
over again we've been pushing towards
making this a self-service interface
because our team is not staffed to go
learn the details of vpcs and get into
the Weeds on like how many bits you
should charge and should you charge for
the iops over the storage itself we're
not staff really to do that that should
come from the team that's managing that
resource and so we tried to push that to
be a self-service discussion and make
that an interface that they're able to
feed to us I think we're also at the
point now where we're starting to give
that to our Central architecture team
and say hey this has huge impacts the
business when we divide these costs this
way so let's have the central
architecture team tell us if this is a
ratified allocation strategy and then
we'll go implement it we have no problem
doing the implementation but there's a
little bit of alignment and corporate
incentive there that we have to work
through um and then the next Wrinkle in
that which we're currently working on
solving again we were sitting down on
Monday whiteboarding this we're getting
into the state where we have allocations
within allocations we have tenants
inside our kubernetes that need to also
do their level of allocation and so
that's actually driving some discussions
that Carl raised about like the
backwards compatibility of our schema
cuz now we we think we do actually
probably need a new column um and so
that's driving some conversations so
it's a multi-layer approach Ben ises
that answer your
question three minutes remain any final burning
applause thanks for watching check out
more finops X 2024 content on our
YouTube channel on the 2024 playlist
support our Channel by liking
subscribing clicking the notification
Bell and by leaving comments and
questions for our speakers we appreciate
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.