YouTube Transcript:
Enterprise Computing Year 12 Unit 1: Data Science

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Enterprise Computing HSC course unit one

data science so in this data science

unit we are establishing that data is

the foundation of all systems we need

data in our systems because data is what

supports our decision making processes

as humans and also indirectly using AI

to interpret data and then us still

viewing what the AI interprets to

understand what actions we should take

place within our Enterprise what we

should do moving forward so this whole

unit is targeted at understanding data

so the first subsection is that of

collecting storing and analyzing data

three separate processes that are

aligned with data in how we get it how

we handle it and how we understand it

firstly is's understanding the

difference between quantitative and

qualitative data quantitative in the

amounts of data we are getting and then

qualitative how valuable that data is we

need to understand and that

distinguishment this day and age it's

very easy to collect data and store data

but then that brings us to our next

point of Big Data the fact that we can

get data very easily but this data then

accumulates and takes up file space so

we need specialized systems and make use

of online storage for storing Big Data

because it is hard to store but having

that data is available because we can

make lots of analyst and interpretations

from that data so we need to build our

systems around the notion of Big

Data one fact that you might know about

data is that we need to store data at

specific data types the way we store

data impacts on its function what

software can be used with it compression

that can be applied with it screen tools

we can use to gather it and interpret it

so that is the data types and the basic

data types are text and number which can

be in form of integers and floating

points okay but then we also have

booleans where we can select

specifically if it's an on or off or yes

or no type of response within it and

then obviously the file extensions that

can be applied to data as well all of

those impact on data's data type and

interprets how a system will use that

data from here then we have a

measurements applied to data so a

variety of different ways of scaling

data and grasping how large it could be

and how it could be used there and

there's some interesting terms there

that I won't go into at at the moment

because I'm still learning them myself

from here we've got got data sampling

when we are getting data from the

environment and putting it into the

system and one such area we know with

data sampling is that of gathering audio

data that's actually called sampling but

we're going beyond that here too and

then the notion of active and passive

sampling when we are intentionally

getting specific types of data or

whether the system is doing it itself

automatically and Gathering data for

us we then have aspects related to data

relevance okay how relevant is the data

for the enterpr prises operations

accuracy the uh correctness of data that

we are getting which we're using in our

system the validity that data is valid

and follows the appropriate rules for it

to be correct by the system and then the

reliability of data in satisfying what

we are using it for so they do overlap

they are all features of each other but

all slightly different definitions there

ultimately that we are getting data for

our system that is correct and

meaningful for our operations

we then have informatics supporting our

understanding of data now this can be

done in a variety of ways but the way

data is then starting to get displayed

in the system we can start making those

interpretations which brings us then to

our next point of presenting data and

I've got this one in yellow because this

might be ways you are presenting it

within your assessment task through

graphs and infographics okay which begin

illustrating data and making them better to

comprehend those spreadsheet style Dash

ports where we have data and it's

represented visually but then if we

start changing the data within this uh

spreadsheet then the actual graphs that

are on display and the pivot tables are

on display they change live in response

to the values we are changing we can use

data then to generate reports as an

output of data that could be presented

and have our own interpretations written

on it and we can also stablish things

such as Network diagrams and Maps which

can show obviously the makeup up of

different segments of a network or a

geographical location and how data might

different in it has it's dispersed

across a specific Network or landscape

so all those features can be used to present

present

data then we can talk about structured

and unstructured data sets and this can

be affiliated with big data but

essentially as data is accumulated is it

in a structured format or is it just

Gathering numbers and we need to kind of

structure it later and what we do there

okay so we need to differentiate between

those two forms of data sets Okay from

here then we also need to gather sources

of feedback based on our data or based

on our system okay where data is acting

uh and we are getting it but then what

response are we doing in relation to

that data because that's what it's all

about we get the data and we make a

response so it's ensures that we know

what our sources of feedback are we have

criteria to make sure that our feedback

is effective and appropriate for

whatever our system is

doing now we then come to errors in data

and errors can be detrimental to

operations so they must be identified so

that they can be addressed errors can

come at the initial point of collection

from our data sources which is why it's

so important to cross reference our data

sources if we are putting incorrect data

into our system it will ultimately be

incorrect information and once processed

create incorrect values that our system

will process and we need to make sure we

identify that so we then don't use those

incorrect values as a part of our

decision-making processes as said this

stems to Raw verse process data okay

when data goes in raw we haven't checked

it there could be errors there and if

it's then processed okay it will lead to

incorrect operations taking place so we

need things such as validations and

verifications in place to check data

when it is entered into a system that it

does go in correctly and if someone

accidentally does do a typo okay it will

identify this is the wrong format or

doesn't follow the range limits we put

on it okay that there rules in place to

ensure that data when goes into the

system through validation through

verification that ensures that it's

entered as a correct format but as said

if the data source is correct we're

going to get from it's still going to go

incorrectly so we need to do our own

research on our end to cross check data

and make sure it's correct the other

area of error related to data is that of

bias that we are selecting data sources

that skew data to a specific way that we

want it to be this can be intentional or

it could be unintentional that we're

just not doing a wide enough level of

research and Gathering data from a wide

enough array of different sources for

our system so we've got a factor in that

buyas can lead to errors as well and

we've got to try to counteract that by

getting data from a variety of locations

in a bunch of diverse locations that

fully represent the scope of data we're

trying to represent within our enterprise

enterprise

system the next one then is blockchain

blockchain being that we can track the

movement of data and this is obviously

heavily affiliated with cryptocurrency

and that might be the best way to

understand that we can actually see how

a cryptocurrency such as Bitcoin has

moved through different ownerships and

we can actually track it from its

Inception so we can actually track data

that's what blockchain is all about so

areas where blockchaining can be used

such as for online voting and tracking

who's doing specific voting um online

identities and what those identities are

doing the movement of specific items

when we this could be digital items or

physical items but knowing who has

ownership on them and thus support

recordkeeping we can put a name to these

things okay and I should specify with

online voting too they're probably not

tracking the name of the person they're

probably just tracking their voting not

you're not allowed to track who they

actually voted for all that because it

is meant to have a nomin imity to it and

all of

that the next area is then privacy and

security of data and specific tools that

we've got to be aware of where we might

have to put security procedures in place

one such one which is obvious is

AutoFill it's great that we our personal

information and our financial

information can be remembered by our

browser and be integrated and inserted

into text boxes automatically when we do

it do an online purchase but then

there's a security Factor related to

that so convenience can be at the cost

of security we got to weigh that up with

our system we also have that of private

and public connections it's great to we

go somewhere such as a public library

and access a public network connection

but is it secure whereas if I use my own

hotspotting or if I just do my work from

home I have a better private connection

there the use of checkbox too uh can

also be a factor in relation to security

when we're switching things on and off

and how it's being used and then also

terms of agreements for the things that

we sign up for are we actually reading

them and that's a big issue because we

specifically with online platforms but

do we fully understand what we're

signing up for it is Ed in their terms

of agreement they do say such as through

social media platforms how they're going

to use our data but we didn't even read

it CU you know sometimes we sign up when

we're young and we don't even care but

those terms of agreement could say that

they're going to use the pictures we're

uploading to a platform as a part of

their own business okay or it could also

limit what we can do as a part of their

licensing agreement how we use their

specific data and platforms so all this

is important and it's all written within

terms of agreement it's just so long to

read and that's also an issue there in

relation to privacy and security

and then we also have the impact of data

scale the amount of data that is

available we are very data Rich these

days it's very easy to get data as said

with big data so we've got to factor in

the volume of raw data we're putting

into our systems how much we're putting

in where's it going to be stored whether

locally or in online platforms which is

more so the case so that it can be

networked as part of a large enterprise

system how data might not necessarily be

downloaded uh from these online

platforms but it's more likely to be

streamed live to keep the data of the

local storage and keep it on the online

storage for

efficiency the way machine learning

interacts with data so machine learning

is obviously when the AI is learning

itself so based on it accumulating data

it changes its responses and interprets

data in different ways so that

accumulations helps it learn data can

also impact on human behavior us as

humans responding to data what do we see

how do we change our actions in response

to data and then the ethical

implications of data what do we do in

response to data and also where are we

getting data from is it always ethical

how we get the data all right and where

can we read data from and who owns that

data so there's many aspects to data and

specifically the collection and who is

viewing it that relate to the ethic of

it okay not all data can be public

because it's all about P it relates to

private individuals in some cases so

there's many ethical implications in

relation to the impact of data okay the

final two things I'll talk about in this

section is firstly data storage how data

will be stored and I've already said it

a few times that we have data that could

be stored on the local storage of a

system on its hard drives and solid

States we can also have local network

storage where we have our own servers

but also these days as well we have

cloud cloud storage and then a variety

of ways that can be used public clouds P

private clouds hybrid clouds and then

that often is the foundation for the

enterprise system and the sharing of

data across a Global Network for that

system so that is then the data storage

but then we also have this thing called

a data warehouse because we have so much

data okay sometimes we take data from

specific time periods so it might be

last year's data related to last year's

customers okay and then we save that

away to a data warehouse once put in

that data warehouse it might then go

with all our previous years okay worth

of data in that warehouse and we store

it there to analyze that data using

Technologies such as olap okay which are

used for data mining and in that data

warehouse we then can look for Trends

and patterns in historical data that can

support us in planning for future

operations so a very supportive tool to

okay for the storage of data but the

analysis of data okay and hopefully

assist us with predicting successful

plans for the future the next section

then is that of data quality data

quality means that obviously data is

correct and reliable but data is

Meaningful for the operations of an

Enterprise so firstly is the ethical use

of data as we already said with ethical

implications we've got this data now we

need to control who who can view this

data and that might be linked to

permissions and who the data is relevant

to as a part of their operations within

the Enterprise and also the sharing of

data and data transparency and the fact

that we have people's personal data

we've got to keep it secure from cyber

security and things like that as well so

we've got to keep an ethical lens on

when accessing data realizing data is

viable and we've got to keep it

private this links us to our social

legal and ethical issues that a bias

which I spoke about before where we can

skew data in different directions and we

should try to get data from a variety of

sources the accuracy of data and how

correct it is the use of metadata the

data behind data okay which is the

fundamentals of databases and websites

and the fact that that also needs to be

kept private because that has uh links

to private

information copyright of specific data

and systems and the acknowledgement of

sources of data that are used within our

systems that we are referencing systems

companies people who produce data when

being used with our systems and then

stemming from that IP intellectual

property and then ICI IP indigenous

cultures intellectual property okay that

we know the laws that are around these

things and we've got to respect those

laws when we are using systems and data

that come that are under IP or

icip the establishment of permissions

rights and privacy rules around data

which we've mentioned before once again

to limit who can view data within

systems while we can all work for the

same Enterprise we shouldn't all have

access to all data of the Enterprise

that's why permissions and rights are

important to establish and then our

security tools for protecting our system

and our Network okay our login

procedures our use of Biometrics

encrypting data in transmited in storage

setting up a firewall for our Network a

whole variety of tools built to protect

our network from cyber security threats

specifically on the legal aspects of

data to we need to know existing

legislations in place such as the

Privacy Act 1988 and those principles

that surround it okay and then also if

we're unsure about things and we need

guidance who are the responsible

authorities we know the government but

then who within the government groups

such as the OIC who we contact in the

instance of a data breach things like

that okay that we need to know

specifically who to go to in instances

where there are concerns about data then

we also need to know about data

sovereignty of indigenous peoples and

how we support them and how data is used

in the context of their cultures and

their community and we still respect

their traditions and belief in how we

use that data to support

them okay and then we've got curated and

communicated data on social behavior

okay understanding things such as data

literacy how to actually specifically

understand data timelines of data and

how it is used okay signals and data

swamps and then educating users in this

area once again an area that I need to

look in more to get my own understanding

about it so that final point is relevant

to me too but there's some key terms

that are also very new to this course in

relation to data and social

behavior the final section is processing

and presenting data so data has been

processed turned into information and we

putting into a format that we can show

stakeholders clients or peers so that it

is ultimately comprehendable to them so

kind of the output of data that has been

digested in a way for people to

understand and here you're going to see

a lot more Yellow Boxes because it could

correlate the things that we could have

embedded into our assessment task so

first one is out of flat file databases

setting up a simple onetable database

that shows a variety of Records usually

related to one specific area that is

done using um a database package such as

Microsoft Access we also then have

spreadsheet summaries for the

correlation of information so this could

be as a user collating information

within the spreadsheets uh rows and

cells and all that but it could also be

that I've got a form on the front end uh

for the collection of data that I've

sent it out as a Google form and I've

shared it with a whole bunch of people

and then when they enter in their

responses it updates in the spreadsheet

okay and then from that spreadsheet I

can then develop things such as um

graphs and tables that summarize data

and make it more comprehendable which

then brings us to our next point of

filtering grouping and sorting data we

can use tools within the spreadsheet to

uh categorize our data and add filters

so we can look at specific data sets and

summarize data and focus on specific

groups we can link sheets with other

sheets and we can also make use of a

thing called conditional formatting

where specific values that meet certain

rules will be highlighted okay it could

be highlighted in red if certain value

is negative or highlighted green if a

certain value represents that a certain

area is doing well this help helps us

with data comparisons and then as said

before we can have forms acting as the

front end for our spreadsheet um

collecting data from a variety of

clients and users okay for our to

accumulate data within our spreadsheet

but then we could also have reports for

our summary that we put this all into a

formatted view to be printed off go all

sent out digitally that summarizes all

the information for our

stakeholders a very modern tool us this

day and this is also in conjunction with

spreadsheets is that of dashboards so

dashboards are like a very graphical

setup for a spreadsheet and in many

cases we actually get rid of the grid of

the spreadsheet so that big tabular

format kind of disappears and it's all

kind of text boxes and visualizations on

screen uh that are used to represent the

actual data so there will be a few

numbers on screen but it's more the

visualizations visualizations in the

forms of graphs but these graphs might

um change based on us entering different

data and data sets but that could be us

manually entering it we could also be

using things such as pivot tables and

slices so tables that will shrink and

enlarge based on what slices are active

so it could be that I have a specific

category of information you could think

of it as subjects at school and when I

click um English Advance only English

Advance students will appear in the

table and the marks allocated to them

but then if I also click English

Standard English Standard students and

English Advanced students will appear in

the table with their marks Al together

side by side so the table will adjust

depending on what slice of categories I

have switched on and off and then that

could also be linked to a graph that is

also adjusting accordingly and

representing metrics in a visualized

format visualization being key and

obviously visualization is now being

introduced here and that correlates with

our next unit of data visualization in

the Enterprise Computing year 12 course

we then have the design of a relational

database so these are the databases that

are larger than flatfile databases and

have multiple tables that we often refer

to as entities we create each of these

entities using a data dictionary that

allows us to establish metadata for each

of the entities what is the actual name

of the actual categories in these

entities which refer to as Fields okay

what data types are they made up giving

desri descriptions about it how long

will they be how much allocation of

memory will we give for each one we

provide examples of data and describe

the data they are all categories

included in a data dictionary as said we

use multiple entities to make a

relational database but we connect them

through relationships through primary

and foreign Keys each actual entity

needs to have a primary key which is its

main key usually an ID field that is a

specific number format and then we can

drag that over to as a foreign key okay

the exact same number to another entity

to establish that relationship once we

have these relational bases databases

set up we can search them and sort them

and one uh very fundamental way of doing

that is using SQL structured query

language where we have a series of

keywords used for selecting different

fields and extracting it from specific

tables and then applying a condition

using the wear keyword making use of

operators to say if data is greater than

less than equal to or combining criteria

together using and and or a whole

variety of tools for searching and

sorting within a relationship database

but also there's things such as QBE

within um modern database Management

systems that can do all this for us

using interfaces but we're still going

to know SQL because we're going to be

doing this in HSC and we can't use

software in the HSC we've got to do it

with our minds writing out the specific

code and then we we mentioned them

before forms and reports um in relations

to filing grouping sorting data well we

can set them up using um database

Management Systems in a relational

database for collecting data and

displaying data at both the front end of

collection and at the back end of

displaying information the final thing

about this unit is that of machine

learning and statistical modeling and

obviously very modern these days and

obviously a new part of the course in

that we now have systems with neural

networks that can learn themselves so

they accumulate all this data they

interpret all this data and then they

give us feedback and present the

visualization itself present the

statistics to us in a formatted View

summarizing it for us makinging our life

a lot more easier because it is

providing because one of the whole

themes of this unit that you've seen

with data science is how much data we

are collecting now okay terabytes of

data exobytes of data now okay data

amounts that we can't comprehend and

these larger Enterprise systems are

doing them daily the amount of data

think about how much data Google gets in

a day so if we can have machine learning

supporting us in this processing and

then giving us its output in a

statistical format in a model that we

can understand because it's a good

summary of that data that is of great

benefit to us as humans so I hope this

video has giv you an understanding of

this first unit of data science a lot of

new technical terms in this unit and

essentially the purpose of the unit

understanding the foundations of data in

how we collect it how it is made how we

store it how we analyze it and

essentially how it is of data quality

how it is of quality to us it is meaning

meaningful to us in our operations so we

need to understand and be able to

comprehend it as said this unit kind of

then stems into the second unit of data

visualizations where we start turning

data into a format that is

comprehendible and usable and thus

meaningful to present to people who

aren't as educated in Computing and in

data so they can understand it and use

it for their purposes but we'll get into

that when we do our next mind map on

data visualizations but hopefully at

this point you understand what data

science is all about for the Enterprise

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…