YouTube Transcript:
FinOps Transformation: Unleashing Comprehension and Agile Data Models

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

Salesforce transformed its FinOps capabilities by re-architecting its data models and infrastructure to be more agile and comprehensive, moving from manual, spreadsheet-based reporting to a unified, data-driven approach that empowers innovation and builds trust.

thanks everybody for joining us this is

uh finops transformation how Salesforce

Unleashed comprehensive and agile data

models over the last year my name is

George Parker I'm the director of finops

product management at Salesforce for

internal products I also oversee

capacity product management as

well I'm danush deaj I'm the lead fops

engineer for sales

force and also a special shout out to

will Forester who's probably going to

watch us on YouTube at some point he's

our director of engineering our close

partner in crime and he helped us put a

lot of this content together today

a little bit of background before we

talk about the details here Salesforce

has been running in the cloud for a long

time if you know Salesforce we pretty

much started in the cloud it was Private

Cloud 25 years ago um but first

announced in 2020 we made it a pivot to

public Cloud uh we call it hyperforce uh

it's a cloud native architecture that

empowered all of our customer agility

customers that have been running for

decades and on- premise data centers

were able to Port their custom

applications and integration to public

cloud and I think as everybody here

knows if you pivot to public Cloud

you're going to need to change your

finops tooling so that's what we've been

doing for the last few years denu I

specifically are from the cloud

economics and capacity management team

we provide data analysis and forecasting

capabilities for Salesforce

infrastructure we're one of one of many

teams focus on availability and

infrastructure engineering within the

broader Salesforce engineering

ecosystem one of the consistent themes

that we found about our pH opset scale

is that it is a big data engineering

problem at the end of the day this is

because of our scale but for the diard

engineers you're probably going to

notice some standard engine engineering

principles that we required in order to

scale throughout this

presentation as we tried to address our

finops concerns we applied a lot of

things that you're going to be doing in

your standard engineering teams already

so this shouldn't be novel and new in

some ways for the engineers but how we

apply them to the finops problems is

what's unique here in the finops context

our broad team includes both finops

practitioners and cost engineering that

builds supporting tooling for showback

cost allocations budgeting and a whole

bunch more in the following present

we're going to Deep dive into some of

the specific methodologies that we use

to build our data Lake how we evolved

our data models and how we approach

quality as a topic we're also not the

only team at Salesforce that cares about

fops we have some great Finance Partners

here they're actually going to go listen

to Apple right now we also have a

partner sourcing team who's sitting in

the second row here uh we couldn't do it

without them uh our colleague I just met

rahil he's from slack part of the

Salesforce ecosystem as well there's a

lot of people at Salesforce that care

about finops and I think if you've been

listening to the Keynotes you know that

it takes the whole Community the whole

company to think about finops as a

problem and I think Salesforce is a good

example of that we also have some

service and budget owners from across

the business that represent individual

service teams at Salesforce there's

largely two major orbits of finops

personas our team's a little bit in the

middle as is sourcing and finance we

have the folks on one side that drive

corporate strategy this is finops

finance exec and procurement and for

this team uh for this group our team

holds a monthly budget review and track

tracks weekly kpis to ensure that we're

on track for our budgets on track for

our strategic objectives and to be

successful with this audience we need to

solve for High Fidelity and extremely

rigorous data quality while accelerating

the teams through simpl to use executive

dashboarding uh and then there are folks

that are actually driving the change

which is engineering product

sustainability and service owners for

these audience we focus on building

self-service based tooling so to be

successful here we focus on having the

tooling to be flexible and then

providing deep in insights that cater to

services uh yeah so today we'll share

how we uh applied agile principles to

our finops architecture which ultimately

transformed our team from falling behind

to innovating over the last two years so

back in 2022 a lot of the reporting done

to the executives were all in U Google

Sheets and uh it required a lot of

manual time to build service owners

weren't happy with the tools that

existed because it wasn't really meeting

to the needs and all the work uh that we

were doing wasn't uh having any customer

impact so at the beginning of 2023 we

went back to the drawing board and

proposed a complete Green Field re

architecture uh so to make Salesforce

phop successful we focused on re

architecting with a few key agile

principles in mind one reacting quickly

to the changing requirements uh

continuously reflecting uh continuously

reflecting on the progress that we have

been doing uh continuously collaborating

with our customers to drive product

engagement and scale and and uh

consistently improving through

incremental Investments so with all of

these agile principles within 6 months

uh we had decommission the previous

products and rebuild the products ground

up uh during the process we also cleared

off majority of the technical

debt and now we began to innovate and we

delivered numerous new phobs

capabilities that were requested over

the previous years our users and

customers were happy and we had managed

to turn around the internal customer

sentiment the business needs still

continue to change but now we are able

to deliver with agility all while

upholding that customer trust so

throughout this talk we'll be sharing

some of the key learnings from the

transformation there were three major

factors that were critical to our

overall success and we're going to

structure this presentation around those

first was leveraging a unified data Lake

that enabled us to react quickly to

change second was designing for

flexibility to sub support the actual

business needs and finally continuously

impacting and improving and inspecting

our data quality

quity with that I'm going to start with

the UniFi data Lake because you can't

develop a whole lot of finops maturity

without a lot of data and at scale again

I want to reinforce this is a big data

problem that requires Big Data

Solutions we'll pause here because I

know people like to take pictures this

is just an architecture slide I'm going

to pause on it so that you can take that

shot we're going to walk through this

slide in a little bit more detail

throughout the deck highlighting a few

of these blocks as we talk about them

our first bottleneck to the team's

throughput was that our data was in

disperate sources prior to 2023 we were

accessing and reading from a lot of

different data sources most of your

fop's capabilities require billing data

to be joined with other data sets things

like infrastructure metrics CPU

utilization logs to show customer usage

that you might need for unit economics

and metadata like team ownership

mappings that help you with your

governance plans every time we needed to

onboard a new data set back in 2022

anytime we had a new user request that

act needed required us to access new

data it would take more than a week to

develop deploy and ingest a new metric

and this created a huge tax on basically

every feature request that we had

anything to do anything to do with

expanding our existing phop capability

was at least a week just for the

overhead to get the data so beyond new

ingestion this slowed our resolution if

we had operational impacts if we had

performance or availability challenges

and all that contributed to user

frustration with our tool

chain so to overcome this issue we made

a strategic ision to move from working

and operating on a custom on-prem

cloud-based platform sorry custom

on-prem platform to migrate to a

Salesforce internal cloud-based uh

unified data Lake where the metric locks

and metadata were already hosted so we

had to onboard the cloud building data

to the platform which we did so by

configuring Cloud native replication

Cloud native replication provides a low

maintenance seamless way to inj spilling

data to the data lake so by collocating

all of these key data sets needed to

build the fops capability uh we could

now join the Cloud building data to

either metric locks or metadata by

simply switching table references uh we

also removed the need and the

operational overhead of maintaining the

previous inje pipelines needed so by

unifying all of these key data sets in

the same data L you can bring

significant agility to your team do has

this helped reduce the data injection

effort of about needing one week effort

from the team to less than uh to less

than one hour so this helped us develop

3x new phobs capabilities uh in an year

with exactly the same team size so we

were able to add a new budgeting

structure a self-service based unit

economics and a workload optimization

dashboard so we also improved the shared

cost allocation coverage from 70 to 95

95% which represents majority of the

kubernetes cost and that's that's also

one of the biggest pend

drivers so once we were on the unified

data Lake our next Focus was on to

making the data models

flexible thanks that's right and I think

if you look back at the lesson about

unified data Lake if you're running your

systems out of multiple different data

sources think about how you might unify

them once you unify them you start need

need to start thinking about the

flexibility of the data model you're

putting them in as you start defining

your unified data Lake you're going to

be setting schemas that are going to

influence how you access and read data

in the future I'll give you an example

from back in 2022 and early 2023 when I

switched into the finops team last year

and I have a history at other companies

of doing fops but it was in a different role

role

in the beginning of 2023 we had really

the opposite of a flexible data model

and I really would call it death by

spreadsheet we have Salesforce tens of

thousands of accounts so just tracking

the new accounts created every month

meant that it took one person about

three days every month just to add the

new account mappings to our data sources

so that way we could do financial

reporting on them it was way too much

the pace of change was literally faster

than we could keep up with new accounts

would just be missed in the monthly

reporting cycles and have to catch up

next month because they just weren't

there there we simplified what that

looked like on this slide just to show a

few pieces of the data but at the time

we maintained all of this by hand in a

spreadsheet that had about 200,000 cells

it was about 20 columns and about 12,000

plus rows each row was uniquely indexed

by the account number and it was

consistently growing by hundreds of rows

every month in this kind of data source

there were actually only a few pieces of

interesting data and they're shown in

the table on the left here for us it was

something that we call a domain we had

the product and the executive to go

along with it it was the exec that

footed the bill the product that it was

under and then it was all again uniquely

indexed by that account much of the rest

of the data in the other 16 or so

columns was just duplicate copy and

paste over and over again that in turn

made it impossible to tell the

difference between an exception and an

error did somebody accidentally drag a

row down and that's why it's different

for this one account or is that because

this account really is special for any

new entry it was obviously slow to be

updated and it made it very difficult to

extract or gross the budget

relationships between these 12 15,000

different accounts and tell which

Services belong to which Executives

without just going through line by line

so think about if you have a data source

like this that you could maybe ascribe

to the death by spreadsheet think about

how you might remove this anti-

pattern instead what we do now the last

year we we modeled this structure as a

set of key

relationships we distilled it down to

two key relationships it's the domain

that maps to the product and the product

that Maps to the executive this

eliminated all of that duplication of

data we actually don't even need that

account ID anymore which used to be our unique

unique

field that made us generate all of those

rows we don't need it anymore because

the accounts are all tagged with that

domain already we can now roll that up

to product and executive it's much

simpler this is higher quality any

exceptions are clearly called out in a

separate set of logic so we ensure that

there are no errors in this it's faster

and easier new entries can be added now

on a self-service basis by the finops

teams and all the downstream tooling

reflects the change within about 12

hours in a lot of cases when a new

account is created we don't even need to

make any edits because we're no longer

looking at the unique account ID if the

tags are correct it's automatically

processed and it shows up in the billing

the next day and it's a lot more

explainable the whole thing is about 200

simple lines of configuration that we

frequently share with budget owners and

Executives that want to understand what

they they're being charged with I'll

literally get a slack message that says

hey this account showed up on my list

the other day I don't know where this

came from and I can just respond oh

here's the mapping for the whole entire

fleet it'll take you five minutes to

read just go skim that and I usually get

responses like wow I was always really

looking for a way to see the overview

like that thank you this transition to a

config driven or what Salesforce

oftentimes will call a clicks not code

approach means that the finops

practitioners are able to selfservice a

whole lot more than they were in the

past last year alone our finops team

made over 500 plus configuration updates

in G itself updating the budget

reporting on their own we manage all of

this currently as yaml on git some an

approach that some of you might already

be taking as well I like the git

workflow I think we we all do it enables

audits CI workflows and tracking and

that's really helpful for us we get peer

reviews when these changes are made we

have a reference to who made the change

in why um if you can replace some of

your tabular anti pattern some of that

death by spreadsheet with this kind of

relationship structure it it is simpler

it is faster and it is more accurate and

it's even better when it's self-service

and accessible to all your non-technical users

that account Works example works great

for account level internal building but

what about resource level

tagging some of you might be guilty of

this my last company was insane tagging

requirements right like 12 tags per

resource label it with every known piece

of metadata you could possibly need um

that's that's the way my last employer

did it and I I'm happy to say we don't

do that here at Salesforce the problem

with that approach is severalfold it is

relatively impossible to add or update

tags retroactively you're going to have

human error uh we never really at my

last company could get finops coverage

past 95% mostly due to that human error

component to solve this at Salesforce we

really only care care about one tag now

the one tag that we care about is

enforced on our corporate objectives

it's reported on globally and it's

enforced in CI pipelines we add Opa

policies when gaps emerge and we make

sure that everybody is using this

service tag that tag needs to be very

granular it ties back to the individual

microser Services being deployed pretty

much everything else is going to be

derived and it's shown here in yellow

from a metadata API that we call all of

the other references

from using the service tag and then

referencing it against the metadata API

we can map it back to teams we can map

it back to managers we can map it back

to engineering orgs and thanks to our

unified data Lake we can further join it

to that account data so we know all the

details from that interface adding in

context about environments budgets and

products and as a result we've improved

our reporting coverage from 9 5% to

about 99% .9% we can also now backfill

which is hugely critical for us

providing the capability to

retroactively associate new metadata

with existing

resources this gives us tremendous

flexibility to make a change in the

middle of the fiscal year while still

offering consistent fiscal year

reporting without weird spikes or jump

Cuts in our data and we can even do AB

testing which we use to develop next

year's framework while still reporting

so uh a challenge we ran into When

developing new phobs cap new phobs

capabilities at Salesforce was the share

number of services that existed at

Salesforce so there are about thousand

plus services and a central phobs team

trying to build a new phobs capability

for every service which has varying

implementation needs and methodology

wasn't a scalable approach so what we

focused on building is externalizing the

implementation details and providing

generic processing capabilities uh so

one example uh I would like to talk

about is is uh unit economics where this

was really helpful at Salesforce we care

a lot about unit economics whether it's

customer usage due to internal or

external demand it can drive material

cost wi to Services now service owners

can't control the demand can't control

the demand but what they can focus on is

whether they're getting consistently

efficient with the varying scale needs

and unit economics is one such metric

that helps with it now to onboard a unit

economics metric a service owner needs

to write down a yaml based configuration

yaml provides a simple interface even

for non technical uh users to write down

configurations so the yaml configuration

is then submitted to the onboarding

interface in our case this is git as

Jorge just said uh it provides like easy

Version Control and audit tracking so

and then there are daily pipelines which

inj just all the configurations from the

git process and publish the data to the

live Tableau dashboards now customers

can look up their uh newly processed

data on the dashboards and they can

track metrics even time over time based

on the backfill needs now this has

enabled onboarding service owners to

either a unit economic capability or a

cost allocation framework and publish

data to a live dashboards in less than a

day uh with almost zero engineering

effort to our team now diving a little

more into the onboarding interface and

the configuration itself so we capture

as much key metadata as possible as part

of the interface some of there are about

four different areas one data governance

that is whether uh who owns the data and

what is the classification of the data

set whether it's uh internal or

restricted and publicly allowed so on

and then a retention that is how long

users want to show the data it could be

number of days whether it's 6 months 12

months 13 months so on and then

documentation so documentation field

highlights the key input and output

metric descriptions so you can see based

on this information how we can generate

an automated data catalog from all the

configurations now and then there's the

fourth part that is the uh key part uh

which is the metric calculation itself

in our case this is through a SQL query

the SQL query allows the parameters to

be filled in during runtime and the

pipelines execute those equl queries an

example query is joining yesterday's

infrastructure cost with uh yesterday's

infrastructure customer usage to uh

calculate the unit

metric previously our primary data model

consisted of a single cost output that

we called total cost this was our

approach in 2022 it represented the

budget impact of a given service

inclusive of discounts fees and credits

we ran into limitations with this

approach as it created a monolithic cost

number that lacked any detail needed for further

further

analysis just one example for us of how

this penalized our efforts as we began

to receive a large amount of one-time

credits on random accounts it created

anomalies in our budget run rates for

all those Services exec wanted all the

one-time anomalies to be hidden they

wanted to see consistent predictable

numbers they didn't want the jump Cuts

in the data so they wanted us to exclude

all those credits from the total cost to

stabilize the trend unfortunately the

limits of that older less agile data

approach limited us in our ability to

execute that change quickly so excluding

credits meant that we needed to create

separate fields and migrate every stage

of our processing pipelines to support

these new Fields then backfill all of

that data and it took us over a month to

do that all while the engineering

leaders were fundamentally confused or

misled about what their budget impacts

were so to overcome this pain uh we

redesigned our data models uh for nend

to agility with the goal of being able

to react to those changing requirements

which George quoted an example with a

goal of reacting to them in less than

one week so to achieve this target we

introduced a field called charts types

so charts types are nothing but the cost

categories which previously formed the

composition of total cost now instead of

storing a single output called total

cost we store cost by all of these array

of chge types which is useful uh to

Salesforce some of them some of the

examples are the usage charges

commitment plan discounts credit credits

and so on now we Define total CH total

cost as a combination of these charts

types you can see how we could include

or exclude credits in less than a day

and publish to our customer facing

dashboards based on the business

requirement so this also prevents us uh

having schema migrations when we are

adding new charts types because now we

are adding them as rows instead of as

columns so this prevents a lot of effort

needed to like migrate all of our ETL

steps uh so beyond the data model uh so

beyond just the agility uh the data

modeling helped us provide deeper

insights uh which are previously not

possible an example use case is for

examp an an engineer wanting to track a

specific architectural optimization

without the added noise of discounts and

credits could do so by just looking up

the usage charge and the by just looking

up the usage and the cost associated

with it so they could just focus on one

metric and then see how their

optimization was

going so once we had the unified data

Lake and then uh we designed our data

models for flexibility our next Focus

was on how we can continuously improve

the data quality that is the data that

we were publishing to our

customers the last major piece of this

presentation is around data quality data

is the foundation of finops and if that

data is not accurate or not available

the business will run the risk of making

decisions on incorrect data and it will

break trust with your users they won't

act on the insights that the finops team

is providing and it'll put your

organizational goals at risk we want to

encourage you to build your brand around

data competency and maintaining trust in

the data you should continuously

prioritize inspecting and improving that data

data

quality I'm going to highlight a couple

things external artifacts that you might

want to reference there's a really good

white paper on the phop site it's called

building and maintaining healthy working

relationships for finops practitioners

and it talks about the Deep importance

of trust for finops teams and we

couldn't agree

more another reference in the marketing

world there's a concept called brand personality

personality

I'm not a big marketing person but back

in the 90s this like marketing industry

Legend guy David akre wrote a framework

that said All Brands basically subscribe

to one of a few specific personalities

they're listed here sincerity excitement

competence sophistication and

ruggedness we use this lens to talk with

our team and reinforce the importance of

what we call data competence we're not

trying to build the most rugged fops

team we're not trying to build the most

sophisticated or sincere team but we

want to be really competent at our data

we've talked about the reasons why

incorrect data creates risk it hurts

trust creates operational risk around

making bad decisions it creates trust

issues that prevent cultural adoption

and it compromises your accountability

structures and governance at your

company to build your brand around data

competency and maintaining trust in the

data that's the importance of

prioritizing The Continuous inspection

and adaptation of your data quality

commitments so to achieve data quality

first we created a framework that allows

our pipelines to run automated data

quality checks so the uh newly processed

data is published to the customers only

if all of the data quality checks pass

uh if any of them fail there is an

automated alert sent to our engineer on

call for immediate tring and resolution

so once we had the framework in place we

uh brainstormed and added data quality

rules throughout our software

development life cycle so before

releasing a product uh we incorporate

data quality as a design review

component uh as part of the pr reviews

we make sure that every change that is

being pushed has the necessary data

quality check associated with it and in

case of outages and root cause analysis

investigations Fe brainstorm how a

future data quality check can prevent

such scenario so data so overall the

data quality framework and the rules

have helped us prevent uh issues in

numerous scenarios and help maintain

customer trust a notable one from a

recent time is when a few million dollar

discrepancy in a specific chart type

calculation was automatically caught and

the team was able to trag and resolve and

and

this was due to a billing column

definition change the incident was

prevented from being pushed to the

customer so once we had the data quality

framework we have a way to measure and

the report on the quality of the data uh

it is now it was now important to call

our shots and follow through to us this

means setting up slos and making sure

making sure that we follow follow the

slos by tracking them and optimizing for

the slos uh we track our slos across

three key metrics one availability we

want all of our data to land in less

than 24 hours uh and then accuracy uh

between two source of datas that we are

comparing we want them to match up to

99.9% and for latency we want the

dashboards to load in less than 5

seconds slos in our case is service

level objectives and we distinguish that

from service level agreements these are

our Targets this is what we measure this

is what we manage we meet regularly to

look at those service level objectives

and review the key business outcomes one

of the things that is really important

for us to look at is our support ticket

load how many incidents occurred how

many users were impacted what are they

asking for are they asking for

enablement or they asking about issues

or feature enhancements we also look

heavily at monthly active users which

reflects how people are engaging with

our products we currently have about

1,00 plus internal monthly active users

using our finops

tools Beyond just measuring and tracking

these numbers it's also important to

action on the improvements that you

identify this could mean things like

prioritizing actions from a recent root cause

cause

analysis it could also mean improving

developer productivity having a

conversation between engineering and

product to say we really need to do this

so that next release we can go

faster and another piece of prioritizing

engineering changes is actually just

communicating the impact to our users

and following up on the RCA that builds

the trust that our team is being

diligent and competent with our data by

creating this positive feedback loop we

improve the availability of our tools

from about 81% to 98% within the last

year we believe this brand of data

competency helped Drive the finops

cultural adoption leading to the monthly

active user improvements on our Flagship

users last piece of that strategy for

quality is also our UI with our old

architecture it was a custom web app we

had a very limited basic Explorer

capability it was relatively expensive

and slow to add new features and

functionality by switching to a bi tool

we had tremendous velocity gains by

switching being able to just drag and

drop our data was itself a huge Boon but

we also gained a whole host of other

features that we would have had to build

ourselves things like alerting and

subscriptions I can use those to track

missing data sharing capabilities that

help me quickly troubleshoot a

customer's issue because they can send

me a URL directly to the view that

they're looking at and I can do the

same while Engineers are troubleshooting

production issues the engineering

manager or product manager can actually

go in and add a data warning to the UI

so any new customer who looks at that

dashboard is alerted that the data here

may be faulty and we're

investigating this approach works well

for us we're a big fan of Tableau some

of you might be using other tools that's

tottally cool we just want to encourage

you to fall in love with your bi tool

and see if you can get velocity gains

out of

it a lot of you might also use your

Cloud providers native Explorer

capability that's totally cool I know I

was happy with it at a previous company

but if you do have access to that bi

tool encourage you to take a look we use

it to build other capabilities Beyond

just exploration which is what we

offered when we were a custom web app

because of the speed gains that we got

from developing on a bi tool we now have

self-service unit cost dashboards we

have budget reporting and there's a

whole lot of other stuff that was being

done in spreadsheets and slide Decks

that can now

move our Flagship product not shown here

is our internal hyperforce cost Explorer

I told you about hyperforce at the

beginning of this we have a cost

Explorer for that again about 750 users

every month we have an infrastructure

budget dashboard that's used by all the

engineering EVPs and their budget and

operations teams to track Cloud cost

performance we also now provide a

dashboard where internal customers can

go look up all of their onboarded unit

economics with the ability to slice and

dice the metrics by some of the key

Dimensions they don't have to work on

building a UI or any of the reporting

all they really have to do is set up

that yaml file the news showed us set

their query tell us what their unit

metric is we already have all that cost

data it's very fast so about six service

teams onboarded in the last two months

providing service teams the ability to

build their own unit economics and just

leverage the framework that we built to

do it faster we also keep because of the

power of our bi tool a staging version

of all of our dashboards if we're making

changes it's very easy for us to not

have to go stand up another custom web

app but simply to have another copy of

our dashboard where we play with the

production and staging data

differently we're really excited to

continue the journey this year with the

methods outlined here we accomplish

lished a whole bunch of objective

quantifiable kpi things I think the real

value has been the trust that we built

some of the quality that we've built the

way it's changed our organization seeing

that many more people accessing and

loving these tools I've had people from

our company come up to me during this

conference and say hey I heard about

that thing that you guys built like

that's super awesome I'm starting to use

it I might build a you know version of

it myself so that kind of traction

doesn't spit on a kpi slide neatly but

that's really the power of what we did

in the last year and I hope that some of

the learnings that we had some of the

methodologies we've taken might help

influence how you're building your tools

at your company it'll free up space for

more Innovation more features and

ultimately more user satisfaction and

finops adoption we'll close before we go

to Q&A um again excited to continue the

journey we're building new product

features we're adding new levels of

budget hierarchy going for more granular

down into the org um shout out to Julia

Harvey and ursul from Nationwide if

anybody knows them they did a great

presentation last year at phop X on the

coin score we totally copied that

plagiarism is the sincerest form of

flattery it's providing us more savings

opportunities internally we all have all

eyes on Focus I'm personally a member of

the focus working group and actively

participating there we're super excited

how that can automate our ETL processing

for other vendors we're implementing new

business process changes adding new

Financial metrics improving our margin

analysis improving our automation we're

looking at layering in some AI use cases

for us that means right now building

kind of some SQL capability ities we

have a lot of people who ask to be able

to query the data we can very quickly

use gen to help us with that um and

we're just excited to come back to

finops X for all the reasons that all of

you already know this kind of community

this kind of sharing is really why we're

here we'll pause here and open it up for

time thank you both and sticking with

the theme of the conference I will not

be running to you with the microphone

but I will lightly jog uh hands up in

the air got one right here my light

jog it sounds like you have uh internal

customers who are reading directly from

the data Lake put building their own

reports building their own dashboards

whatever it is that their specific needs

are how do you handle the changing data

schema with

a potentially a large number of internal

customers who may be impacted when say

all of a sudden you your schema aligns

to focus for

instance it's a good question and by the

way that's Carl from Walmart um he's an

active member of the focus group too

shout out to Carl brilliant mind um we

mostly have been able to keep it

relatively stable we do we have added a

few other tables over the last year um

we're having a discussion now as we

start working through multiple layers of

cost allocation it's not just

Distributing our kubernetes spend it's

Distributing the tenant charges on

kubernetes so now we're having to peel

the onion with allocations that may

drive some new columns we were talking

about that on Monday at the office

office um we've mostly been able to kind

of follow an evolutionary schema

approach where we're just adding a

column we're not really having to hard

cut people over yet um it's one of the

reasons why I think going back to the

focus discussions we've had around

versioning the API we want to be really

really thorough if we're going to make

any kind of breaking changes I think all

of the major Cloud providers are pretty

conscious of that stuff too so we just

don't really want to go through and like

rip a column out all of a sudden we've

talked about like adding like a new

column and just calling it like beta

column name instead of you know the the

current column name so we've been

working through that I think it's kind

of a case by case but mostly I don't

think we've had any real breaking

changes with our tables we've just added some

some

more yeah pretty much the same uh we

have tried to make it Backward

Compatible so far and we have been

successful some of the other ways like

we we are thinking to handle them is

like having V1 and V2 schemas where like

customers will have to migrate over to a

new schema although sometimes we try to

abstract through views instead of like

directly exposing the table so we can

hide some of the intricacies for our

customers uh but but but yeah we haven't

run into a real uh breaking chain scenario

yet I question on the pricing showback

like if we have teams trying to do uh

forecast and then the showback logic are

you guys using like actual prices from

like an OS where it's like on demand and

then we'd buy a bunch of flex cuds

because their resource cuds expired so

they didn't do any quantity changes but

they saw a big dollar difference and now

it's throwing us off so how do you guys

solve for actual or like a rolling

90-day Blended average like how how are

you guys attacking that that's a good

question and that's Brent anybody else

who has a question encourage you to say

your name and what company you're with

just I'd love to meet the community here

I had dinner with Brent last night it

was nice to sit across the table from

him um forecasting is a little bit

different topic than we were covering

today it's a really good question though

we're still working through some of the

details on forecasting we're right now

in more of a naive Trend based

forecasting and working to layer the

driver based stuff on there most of what

we're doing today is either with the the

raw usage cost or with the total kind of

aggregated cost um there's a huge value

in being able to separate out uh and see

what like kind of the engineering impact

is and forecasting with that kind of raw

usage cost but at the end of the day it

is going to come down to what's the net

net um we do separate credits out which

helps tremendously we've kind of pulled

that out and so to a large extent we're

consistent with our you know pricing

agreements and everything like that but

uh without going into too much more

detail about all the nuances of of

pricing rates and stuff like that I

think that's generally we we look at it

from a total cost perspective absent or

I have two questions uh number one is

you talked about I mean to say you have rearchitecturing

at the level of the deployments SP for

example or in kubernetes for EBS storage

there's persistent volume claims so

either claims or persistent volume sells

are taged so we are able to retrieve

those and then uh when we allocate we

are able to provide a s uh unified view

of a service cost whether a service is

running within a pod or it's it runs in

its own instances so uh we we are able

to unify that but uh tagging has to go

beyond just the resource itself like

sometimes like the Pod and anything

that's Deployable just a followup does

it take any CPU or memory metric because

at pod level but then it comes back to

the container yeah so uh we use the Pod

level requests and uh use use CPU course

both to determine the cost show back uh

again it's it's it's a broader

discussion but we use couple of metrics

to determine what are the costs of each

Services running within a CU kubernetes

cluster I encourage you to stick around

and ask D 10 more questions about that

he can go deeper on this topic so if you

want to know more he's the guy um I'll

also add that we have our own metadata

API that helps us so you mentioned

service owner and we I would distinguish

that from service name we actually tag

with a service name sometimes it's in

the Pod label sometimes it's in a

resource tag but we use that service

name as kind of that unique index and

then we use a meditated API to tie that

to specific service owners teams and we

actually can roll that up our work chart

and say okay that goes to that manager

which goes to that VP which goes to that

product Etc so we have another API that

we also leverage here and another

question I have is on that charge type

that you introduced how do you show the amortized

amortized

cost uh can you give an example like I

can take that one um to a large extent

we're not looking at the amortise

charges um we've kind of mostly looked

at we mostly look at that from the

monthly bill perspective so we're not

really getting too much into the

amortised piece we're mostly looking at

real real Bill impact we can ex extract

that out and if we want to look at real

Bill impact and we want to look at

engineering impact that's where we look

at the the raw usage cost we're mostly

not doing the the SI piece right

come I'm curious I enjoyed the

presentation it was awesome um actually really

really

was I know him that's why I'm I'm not

just like raing

him I am curious though how do you de

with shared costs uh and and give

visibility into that for uh owners

that's a good question Ben thank you I

appreciate that Ben Hartwell from Splunk

everybody great person to talk to as

well um we are in the middle of

implementing a selfservice allocation

interface we're probably I would say

about on like the fourth or fifth

allocation and we're getting to the

point where it's largely self-service

the first big one that we had to solve

was kubernetes and I'm actually super

excited about the fision or scad feature

from AWS um they're actually starting to

be able to do allocations scad stands

for shared cost allocation data it's

breaking apart your eks charges in the

bill itself they already support this

for EC s it's coming out now for eks

there's a PM here from AWS you can talk

more about that so that's probably the

first one if you're thinking about

allocations at your company you're

probably thinking about kubernetes first

we did um so we solved the kubernetes

piece on our own we may switch to AWS a

solution on this because I like taking

stuff from them instead of having to

build maintain it myself um as we got

through that one there was other obvious

cases where we had shared costs that we

needed to break up some examples are our

vpcs those are shared network resources

um our odcs are actually shared across

teams we need to solve for that um we

have other networking Services I won't

get into the details of but we have a

bunch of networking and shared resources

across the board where tagging with the

individual service name doesn't really

make sense because there's a lot of

services all sharing one resource we

first took the approach of working

directly with the team doing all the

number crunching and again this is your

guy so you want to talk deeper talk to

den uh we look at like the system

utilization and take that cost and

divide it by the system utilization and

start attributing that out to the

different service owners that are

sharing that resource there's a bunch of

different ways to do that you look at it

from a flat perspective you can divide

it by system resource utilization you

can come up with weighted averages based

on p&l like there's a bunch of ways to

do that I be careful here because

especially when you get into chargeback

we're really focused on the showback of

this right now our budget's done a

little bit differently but we so we do a

lot of this in the context of showback

not chargeback when you get into

chargeback you're actually creating some

serious business incentives for

different pieces of your org if you

start I was talking to udum about this

earlier and like if you start taking you

know a single large monolith existing

stable mature product

sharing Resources with some startup

brand new projects and then try to do

like a flat bill across those you could

put those smaller products out of

business essentially because the

overweight of the cost on them is just

too much too burdensome and meanwhile if

it's a shared service that you're trying

to drive adoption internally like let's

say you're trying to use Vault and all

of a sudden there's this like $100,000

bill that that service is going to get

for enabling Vault they're going to go

wait a second maybe I want to build a

smaller solution and that actually

defeats your corporate objective so you

have to be careful when you get into

char back for this again we've

implemented this a couple times over and

over again we've been pushing towards

making this a self-service interface

because our team is not staffed to go

learn the details of vpcs and get into

the Weeds on like how many bits you

should charge and should you charge for

the iops over the storage itself we're

not staff really to do that that should

come from the team that's managing that

resource and so we tried to push that to

be a self-service discussion and make

that an interface that they're able to

feed to us I think we're also at the

point now where we're starting to give

that to our Central architecture team

and say hey this has huge impacts the

business when we divide these costs this

way so let's have the central

architecture team tell us if this is a

ratified allocation strategy and then

we'll go implement it we have no problem

doing the implementation but there's a

little bit of alignment and corporate

incentive there that we have to work

through um and then the next Wrinkle in

that which we're currently working on

solving again we were sitting down on

Monday whiteboarding this we're getting

into the state where we have allocations

within allocations we have tenants

inside our kubernetes that need to also

do their level of allocation and so

that's actually driving some discussions

that Carl raised about like the

backwards compatibility of our schema

cuz now we we think we do actually

probably need a new column um and so

that's driving some conversations so

it's a multi-layer approach Ben ises

that answer your

question three minutes remain any final burning

applause thanks for watching check out

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:FinOps Transformation: Unleashing Comprehension and Agile Data Models