Verfügbarkeit: Disrupting data discovery | Technische Universität München

Disrupting data discovery:

Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common wa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Beteiligte Personen:	Grover, Mark (VerfasserIn), Feng, Tao (VerfasserIn)
Körperschaft:	Safari, an O'Reilly Media Company (MitwirkendeR)
Format:	Elektronisch Video
Sprache:	Englisch
Veröffentlicht:	[Erscheinungsort nicht ermittelbar] O'Reilly Media, Inc. 2019
Ausgabe:	1st edition.
Schlagwörter:	Lyft (Firm) Strata Conference > San Francisco, Calif.) > (2019 Business enterprises > Computer networks Decision making > Data processing Electronic data processing > Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos
Links:	https://learning.oreilly.com/library/view/-/0636920340027/?ar
Zusammenfassung:	Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco.
Beschreibung:	Online resource; Title from title screen (viewed October 31, 2019)
Umfang:	1 Online-Ressource (1 video file, approximately 42 min.)
Format:	Mode of access: World Wide Web.

Internformat

MARC


LEADER	00000cgm a22000002c 4500
001	ZDB-30-ORH-048540463
003	DE-627-1
005	20240228120934.0
006	m o \| \|
007	cr uuu---uuuuu
008	191206s2019 xx \|\|\| \|o o \|\|eng c
035			\|a (DE-627-1)048540463
035			\|a (DE-599)KEP048540463
035			\|a (ORHE)0636920340027
035			\|a (DE-627-1)048540463
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
082	0		\|a E VIDEO
100	1		\|a Grover, Mark \|e VerfasserIn \|4 aut
245	1	0	\|a Disrupting data discovery \|c Grover, Mark
250			\|a 1st edition.
264		1	\|a [Erscheinungsort nicht ermittelbar] \|b O'Reilly Media, Inc. \|c 2019
264		2	\|a Boston, MA \|b Safari.
300			\|a 1 Online-Ressource (1 video file, approximately 42 min.)
336			\|a zweidimensionales bewegtes Bild \|b tdi \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a Online resource; Title from title screen (viewed October 31, 2019)
520			\|a Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco.
538			\|a Mode of access: World Wide Web.
610	1	0	\|a Lyft (Firm)
611	2	0	\|a Strata Conference \|c San Francisco, Calif.) \|d (2019
650		0	\|a Business enterprises \|x Computer networks
650		0	\|a Decision making \|x Data processing
650		0	\|a Electronic data processing \|x Management
650		4	\|a Entreprises ; Réseaux d'ordinateurs
650		4	\|a Prise de décision ; Informatique
650		4	\|a Electronic videos
700	1		\|a Feng, Tao \|e VerfasserIn \|4 aut
710	2		\|a Safari, an O'Reilly Media Company. \|e MitwirkendeR \|4 ctb
966	4	0	\|l DE-91 \|p ZDB-30-ORH \|q TUM_PDA_ORH \|u https://learning.oreilly.com/library/view/-/0636920340027/?ar \|m X:ORHE \|x Aggregator \|z lizenzpflichtig \|3 Volltext
912			\|a ZDB-30-ORH
935			\|c vide
951			\|a BO
912			\|a ZDB-30-ORH
049			\|a DE-91

Datensatz im Suchindex

DE-BY-TUM_katkey	ZDB-30-ORH-048540463
_version_	1831287162272743424
adam_text
any_adam_object
author	Grover, Mark Feng, Tao
author_corporate	Safari, an O'Reilly Media Company
author_corporate_role	ctb
author_facet	Grover, Mark Feng, Tao Safari, an O'Reilly Media Company
author_role	aut aut
author_sort	Grover, Mark
author_variant	m g mg t f tf
building	Verbundindex
bvnumber	localTUM
collection	ZDB-30-ORH
ctrlnum	(DE-627-1)048540463 (DE-599)KEP048540463 (ORHE)0636920340027
dewey-raw	E VIDEO
dewey-search	E VIDEO
edition	1st edition.
format	Electronic Video
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03765cgm a22004932c 4500</leader><controlfield tag="001">ZDB-30-ORH-048540463</controlfield><controlfield tag="003">DE-627-1</controlfield><controlfield tag="005">20240228120934.0</controlfield><controlfield tag="006">m o \| \| </controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191206s2019 xx \|\|\| \|o o \|\|eng c</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)048540463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KEP048540463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ORHE)0636920340027</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)048540463</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">E VIDEO</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Grover, Mark</subfield><subfield code="e">VerfasserIn</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Disrupting data discovery</subfield><subfield code="c">Grover, Mark</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1st edition.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[Erscheinungsort nicht ermittelbar]</subfield><subfield code="b">O'Reilly Media, Inc.</subfield><subfield code="c">2019</subfield></datafield><datafield tag="264" ind1=" " ind2="2"><subfield code="a">Boston, MA</subfield><subfield code="b">Safari.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (1 video file, approximately 42 min.)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">zweidimensionales bewegtes Bild</subfield><subfield code="b">tdi</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Online resource; Title from title screen (viewed October 31, 2019)</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco.</subfield></datafield><datafield tag="538" ind1=" " ind2=" "><subfield code="a">Mode of access: World Wide Web.</subfield></datafield><datafield tag="610" ind1="1" ind2="0"><subfield code="a">Lyft (Firm)</subfield></datafield><datafield tag="611" ind1="2" ind2="0"><subfield code="a">Strata Conference</subfield><subfield code="c">San Francisco, Calif.)</subfield><subfield code="d">(2019</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Business enterprises</subfield><subfield code="x">Computer networks</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Decision making</subfield><subfield code="x">Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Electronic data processing</subfield><subfield code="x">Management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entreprises ; Réseaux d'ordinateurs</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Prise de décision ; Informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Electronic videos</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Feng, Tao</subfield><subfield code="e">VerfasserIn</subfield><subfield code="4">aut</subfield></datafield><datafield tag="710" ind1="2" ind2=" "><subfield code="a">Safari, an O'Reilly Media Company.</subfield><subfield code="e">MitwirkendeR</subfield><subfield code="4">ctb</subfield></datafield><datafield tag="966" ind1="4" ind2="0"><subfield code="l">DE-91</subfield><subfield code="p">ZDB-30-ORH</subfield><subfield code="q">TUM_PDA_ORH</subfield><subfield code="u">https://learning.oreilly.com/library/view/-/0636920340027/?ar</subfield><subfield code="m">X:ORHE</subfield><subfield code="x">Aggregator</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="935" ind1=" " ind2=" "><subfield code="c">vide</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">BO</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91</subfield></datafield></record></collection>
id	ZDB-30-ORH-048540463
illustrated	Not Illustrated
indexdate	2025-05-05T13:25:32Z
institution	BVB
language	English
open_access_boolean
owner	DE-91 DE-BY-TUM
owner_facet	DE-91 DE-BY-TUM
physical	1 Online-Ressource (1 video file, approximately 42 min.)
psigel	ZDB-30-ORH TUM_PDA_ORH ZDB-30-ORH
publishDate	2019
publishDateSearch	2019
publishDateSort	2019
publisher	O'Reilly Media, Inc.
record_format	marc
spelling	Grover, Mark VerfasserIn aut Disrupting data discovery Grover, Mark 1st edition. [Erscheinungsort nicht ermittelbar] O'Reilly Media, Inc. 2019 Boston, MA Safari. 1 Online-Ressource (1 video file, approximately 42 min.) zweidimensionales bewegtes Bild tdi rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Online resource; Title from title screen (viewed October 31, 2019) Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco. Mode of access: World Wide Web. Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos Feng, Tao VerfasserIn aut Safari, an O'Reilly Media Company. MitwirkendeR ctb
spellingShingle	Grover, Mark Feng, Tao Disrupting data discovery Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos
title	Disrupting data discovery
title_auth	Disrupting data discovery
title_exact_search	Disrupting data discovery
title_full	Disrupting data discovery Grover, Mark
title_fullStr	Disrupting data discovery Grover, Mark
title_full_unstemmed	Disrupting data discovery Grover, Mark
title_short	Disrupting data discovery
title_sort	disrupting data discovery
topic	Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos
topic_facet	Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos
work_keys_str_mv	AT grovermark disruptingdatadiscovery AT fengtao disruptingdatadiscovery AT safarianoreillymediacompany disruptingdatadiscovery

Verfügbarkeit

‌

Online lesen