Lecture Administration and visualization: Chapter 6 - Tools for data visualization
Số trang: 33
Loại file: pdf
Dung lượng: 2.55 MB
Lượt xem: 16
Lượt tải: 0
Xem trước 4 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Lecture "Administration and visualization: Chapter 6 - Tools for data visualization" provides students with content about: Three kinds of visualization; Mathematical visualization; Scientific visualization; Information visualization; Introduction to pandas, numpy; Introduction to matplotlib;... Please refer to the detailed content of the lecture!
Nội dung trích xuất từ tài liệu:
Lecture Administration and visualization: Chapter 6 - Tools for data visualization1Chapter 6: Tools forData Visualization Lecture 0: Introduction to Course 2Outline1. Overview2. Introduction to pandas, numpy3. Introduction to matplotlib 31. Overview• Three kinds of visualization• Mathematical Visualization• Scientific Visualization• Information Visualization 4Mathematical Visualization• Data results from a mathematical equation• Missing data can be readily generated by a computer program 5Scientific Visualization• Visualization of scientific data• Data measured from real world scientific devices or come from expensive simulations• Coordinate data • Spatial coordinates • Temperature, pressure • time 6 Information Visualization• Visualization of more abstract, non- coordinate data• Process abstract data into a more concrete form that can be more effectively perceived by an observer 7Modes of Visualization Interactive Presentation Visualization Visualization• Used for • Used for discovery communication• Intended for a • Intended for large single investigator group or mass• Re-renders based audience on user input • Does not support user input 8Goal of visualization• Comparison• Distribution• Relationship 9Data Visualization Framework 10 Data Types Discrete Continuous Ordered (values are comparable) Unordered(values are not comparable) 112. Introduction to numpy, pandas• Python provides some library to manipulating with data• numpy: a basic library to working with arrays• pandas: another library with more functionalities 12Numpy• Stands for “Numerical Python” or “Numeric Python”• Introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects• Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance• Many other python libraries are built on NumPy Link: http://www.numpy.org/ 13Pandas• Adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R)• Provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.• Allows handling missing data Link: http://pandas.pydata.org/ 14Loading Python LibrariesIn [ ]: #Import Python Libraries import numpy as np import scipy as sp import pandas as pd import matplotlib as mpl import seaborn as sns 15 Reading data using pandasIn [ ]: #Read csv file df = pd.read_csv(URI) Note: URI contains the link to the data file The above command has many optional arguments to fine-tune the data import process. There is a number of pandas commands to read other data formats: pd.read_excel(myfile.xlsx,sheet_name=Sheet1’, index_col=None, na_values=[NA]) pd.read_stata(myfile.dta) pd.read_sas(myfile.sas7bdat) pd.read_hdf(myfile.h5,df) 16 Exploring data frameimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltiris =pd.read_csv(../input/Iris.csv)iris.head() 17 Data Frame data typesPandas Type Native Python Type Descriptionobject string The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).int64 int Numeric characters. 64 refers to the memory allocated to hold this character.float64 float ...
Nội dung trích xuất từ tài liệu:
Lecture Administration and visualization: Chapter 6 - Tools for data visualization1Chapter 6: Tools forData Visualization Lecture 0: Introduction to Course 2Outline1. Overview2. Introduction to pandas, numpy3. Introduction to matplotlib 31. Overview• Three kinds of visualization• Mathematical Visualization• Scientific Visualization• Information Visualization 4Mathematical Visualization• Data results from a mathematical equation• Missing data can be readily generated by a computer program 5Scientific Visualization• Visualization of scientific data• Data measured from real world scientific devices or come from expensive simulations• Coordinate data • Spatial coordinates • Temperature, pressure • time 6 Information Visualization• Visualization of more abstract, non- coordinate data• Process abstract data into a more concrete form that can be more effectively perceived by an observer 7Modes of Visualization Interactive Presentation Visualization Visualization• Used for • Used for discovery communication• Intended for a • Intended for large single investigator group or mass• Re-renders based audience on user input • Does not support user input 8Goal of visualization• Comparison• Distribution• Relationship 9Data Visualization Framework 10 Data Types Discrete Continuous Ordered (values are comparable) Unordered(values are not comparable) 112. Introduction to numpy, pandas• Python provides some library to manipulating with data• numpy: a basic library to working with arrays• pandas: another library with more functionalities 12Numpy• Stands for “Numerical Python” or “Numeric Python”• Introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects• Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance• Many other python libraries are built on NumPy Link: http://www.numpy.org/ 13Pandas• Adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R)• Provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.• Allows handling missing data Link: http://pandas.pydata.org/ 14Loading Python LibrariesIn [ ]: #Import Python Libraries import numpy as np import scipy as sp import pandas as pd import matplotlib as mpl import seaborn as sns 15 Reading data using pandasIn [ ]: #Read csv file df = pd.read_csv(URI) Note: URI contains the link to the data file The above command has many optional arguments to fine-tune the data import process. There is a number of pandas commands to read other data formats: pd.read_excel(myfile.xlsx,sheet_name=Sheet1’, index_col=None, na_values=[NA]) pd.read_stata(myfile.dta) pd.read_sas(myfile.sas7bdat) pd.read_hdf(myfile.h5,df) 16 Exploring data frameimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltiris =pd.read_csv(../input/Iris.csv)iris.head() 17 Data Frame data typesPandas Type Native Python Type Descriptionobject string The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).int64 int Numeric characters. 64 refers to the memory allocated to hold this character.float64 float ...
Tìm kiếm theo từ khóa liên quan:
Lecture Administration and visualization Administration and visualization Tools for data visualization Introduction to matplotlib Introduction to pandas Mathematical VisualizationTài liệu liên quan:
-
Lecture Administration and visualization: Chapter 5.1 - Exploratory data analysis
83 trang 22 0 0 -
41 trang 17 0 0
-
Lecture Administration and visualization: Chapter 7 - Data visualization charts
72 trang 17 0 0 -
Lecture Administration and visualization: Chapter 8.2 - Interactive visualization
31 trang 16 0 0 -
Lecture Administration and visualization: Chapter 3.3 - Data lake
45 trang 15 0 0 -
Lecture Administration and visualization: Chapter 2.1 - File management
29 trang 15 0 0 -
Lecture Administration and visualization: Chapter 2.2 - Hadoop distributed file system (HDFS)
31 trang 13 0 0 -
Lecture Administration and visualization: Chapter 3.1 - Data modelling and databases
56 trang 13 0 0 -
Lecture Administration and visualization: Chapter 8.1 - Interactive visualization
48 trang 12 0 0 -
Lecture Administration and visualization: Chapter 3.2 - Data modelling and databases OLTP & OLAP
71 trang 12 0 0