Disrupting data discovery:
Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common wa...
Gespeichert in:
Beteiligte Personen: | , |
---|---|
Körperschaft: | |
Format: | Elektronisch Video |
Sprache: | Englisch |
Veröffentlicht: |
[Erscheinungsort nicht ermittelbar]
O'Reilly Media, Inc.
2019
|
Ausgabe: | 1st edition. |
Schlagwörter: | |
Links: | https://learning.oreilly.com/library/view/-/0636920340027/?ar |
Zusammenfassung: | Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco. |
Beschreibung: | Online resource; Title from title screen (viewed October 31, 2019) |
Umfang: | 1 Online-Ressource (1 video file, approximately 42 min.) |
Format: | Mode of access: World Wide Web. |
Internformat
MARC
LEADER | 00000cgm a22000002c 4500 | ||
---|---|---|---|
001 | ZDB-30-ORH-048540463 | ||
003 | DE-627-1 | ||
005 | 20240228120934.0 | ||
006 | m o | | | ||
007 | cr uuu---uuuuu | ||
008 | 191206s2019 xx ||| |o o ||eng c | ||
035 | |a (DE-627-1)048540463 | ||
035 | |a (DE-599)KEP048540463 | ||
035 | |a (ORHE)0636920340027 | ||
035 | |a (DE-627-1)048540463 | ||
040 | |a DE-627 |b ger |c DE-627 |e rda | ||
041 | |a eng | ||
082 | 0 | |a E VIDEO | |
100 | 1 | |a Grover, Mark |e VerfasserIn |4 aut | |
245 | 1 | 0 | |a Disrupting data discovery |c Grover, Mark |
250 | |a 1st edition. | ||
264 | 1 | |a [Erscheinungsort nicht ermittelbar] |b O'Reilly Media, Inc. |c 2019 | |
264 | 2 | |a Boston, MA |b Safari. | |
300 | |a 1 Online-Ressource (1 video file, approximately 42 min.) | ||
336 | |a zweidimensionales bewegtes Bild |b tdi |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Online resource; Title from title screen (viewed October 31, 2019) | ||
520 | |a Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco. | ||
538 | |a Mode of access: World Wide Web. | ||
610 | 1 | 0 | |a Lyft (Firm) |
611 | 2 | 0 | |a Strata Conference |c San Francisco, Calif.) |d (2019 |
650 | 0 | |a Business enterprises |x Computer networks | |
650 | 0 | |a Decision making |x Data processing | |
650 | 0 | |a Electronic data processing |x Management | |
650 | 4 | |a Entreprises ; Réseaux d'ordinateurs | |
650 | 4 | |a Prise de décision ; Informatique | |
650 | 4 | |a Electronic videos | |
700 | 1 | |a Feng, Tao |e VerfasserIn |4 aut | |
710 | 2 | |a Safari, an O'Reilly Media Company. |e MitwirkendeR |4 ctb | |
966 | 4 | 0 | |l DE-91 |p ZDB-30-ORH |q TUM_PDA_ORH |u https://learning.oreilly.com/library/view/-/0636920340027/?ar |m X:ORHE |x Aggregator |z lizenzpflichtig |3 Volltext |
912 | |a ZDB-30-ORH | ||
935 | |c vide | ||
951 | |a BO | ||
912 | |a ZDB-30-ORH | ||
049 | |a DE-91 |
Datensatz im Suchindex
DE-BY-TUM_katkey | ZDB-30-ORH-048540463 |
---|---|
_version_ | 1831287162272743424 |
adam_text | |
any_adam_object | |
author | Grover, Mark Feng, Tao |
author_corporate | Safari, an O'Reilly Media Company |
author_corporate_role | ctb |
author_facet | Grover, Mark Feng, Tao Safari, an O'Reilly Media Company |
author_role | aut aut |
author_sort | Grover, Mark |
author_variant | m g mg t f tf |
building | Verbundindex |
bvnumber | localTUM |
collection | ZDB-30-ORH |
ctrlnum | (DE-627-1)048540463 (DE-599)KEP048540463 (ORHE)0636920340027 |
dewey-raw | E VIDEO |
dewey-search | E VIDEO |
edition | 1st edition. |
format | Electronic Video |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03765cgm a22004932c 4500</leader><controlfield tag="001">ZDB-30-ORH-048540463</controlfield><controlfield tag="003">DE-627-1</controlfield><controlfield tag="005">20240228120934.0</controlfield><controlfield tag="006">m o | | </controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191206s2019 xx ||| |o o ||eng c</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)048540463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KEP048540463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ORHE)0636920340027</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627-1)048540463</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">E VIDEO</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Grover, Mark</subfield><subfield code="e">VerfasserIn</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Disrupting data discovery</subfield><subfield code="c">Grover, Mark</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1st edition.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[Erscheinungsort nicht ermittelbar]</subfield><subfield code="b">O'Reilly Media, Inc.</subfield><subfield code="c">2019</subfield></datafield><datafield tag="264" ind1=" " ind2="2"><subfield code="a">Boston, MA</subfield><subfield code="b">Safari.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (1 video file, approximately 42 min.)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">zweidimensionales bewegtes Bild</subfield><subfield code="b">tdi</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Online resource; Title from title screen (viewed October 31, 2019)</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco.</subfield></datafield><datafield tag="538" ind1=" " ind2=" "><subfield code="a">Mode of access: World Wide Web.</subfield></datafield><datafield tag="610" ind1="1" ind2="0"><subfield code="a">Lyft (Firm)</subfield></datafield><datafield tag="611" ind1="2" ind2="0"><subfield code="a">Strata Conference</subfield><subfield code="c">San Francisco, Calif.)</subfield><subfield code="d">(2019</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Business enterprises</subfield><subfield code="x">Computer networks</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Decision making</subfield><subfield code="x">Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Electronic data processing</subfield><subfield code="x">Management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Entreprises ; Réseaux d'ordinateurs</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Prise de décision ; Informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Electronic videos</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Feng, Tao</subfield><subfield code="e">VerfasserIn</subfield><subfield code="4">aut</subfield></datafield><datafield tag="710" ind1="2" ind2=" "><subfield code="a">Safari, an O'Reilly Media Company.</subfield><subfield code="e">MitwirkendeR</subfield><subfield code="4">ctb</subfield></datafield><datafield tag="966" ind1="4" ind2="0"><subfield code="l">DE-91</subfield><subfield code="p">ZDB-30-ORH</subfield><subfield code="q">TUM_PDA_ORH</subfield><subfield code="u">https://learning.oreilly.com/library/view/-/0636920340027/?ar</subfield><subfield code="m">X:ORHE</subfield><subfield code="x">Aggregator</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="935" ind1=" " ind2=" "><subfield code="c">vide</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">BO</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-ORH</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91</subfield></datafield></record></collection> |
id | ZDB-30-ORH-048540463 |
illustrated | Not Illustrated |
indexdate | 2025-05-05T13:25:32Z |
institution | BVB |
language | English |
open_access_boolean | |
owner | DE-91 DE-BY-TUM |
owner_facet | DE-91 DE-BY-TUM |
physical | 1 Online-Ressource (1 video file, approximately 42 min.) |
psigel | ZDB-30-ORH TUM_PDA_ORH ZDB-30-ORH |
publishDate | 2019 |
publishDateSearch | 2019 |
publishDateSort | 2019 |
publisher | O'Reilly Media, Inc. |
record_format | marc |
spelling | Grover, Mark VerfasserIn aut Disrupting data discovery Grover, Mark 1st edition. [Erscheinungsort nicht ermittelbar] O'Reilly Media, Inc. 2019 Boston, MA Safari. 1 Online-Ressource (1 video file, approximately 42 min.) zweidimensionales bewegtes Bild tdi rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Online resource; Title from title screen (viewed October 31, 2019) Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco. Mode of access: World Wide Web. Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos Feng, Tao VerfasserIn aut Safari, an O'Reilly Media Company. MitwirkendeR ctb |
spellingShingle | Grover, Mark Feng, Tao Disrupting data discovery Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos |
title | Disrupting data discovery |
title_auth | Disrupting data discovery |
title_exact_search | Disrupting data discovery |
title_full | Disrupting data discovery Grover, Mark |
title_fullStr | Disrupting data discovery Grover, Mark |
title_full_unstemmed | Disrupting data discovery Grover, Mark |
title_short | Disrupting data discovery |
title_sort | disrupting data discovery |
topic | Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos |
topic_facet | Lyft (Firm) Strata Conference San Francisco, Calif.) (2019 Business enterprises Computer networks Decision making Data processing Electronic data processing Management Entreprises ; Réseaux d'ordinateurs Prise de décision ; Informatique Electronic videos |
work_keys_str_mv | AT grovermark disruptingdatadiscovery AT fengtao disruptingdatadiscovery AT safarianoreillymediacompany disruptingdatadiscovery |