Notice
Recent Posts
Recent Comments
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- 빅데이터
- GDB
- 딥러닝
- spark
- graph
- DeepLearning
- RStudio
- GraphX
- RDD
- 연합학습
- Graph Tech
- graph database
- BigData
- 그래프
- Python
- Neo4j
- SQL
- 그래프 질의언어
- 그래프 에코시스템
- Federated Learning
- Cypher
- 인공지능
- 분산 병렬 처리
- GSQL
- r
- TigerGraph
- Graph Ecosystem
- TensorFlow
- SparkML
- 그래프 데이터베이스
Archives
- Today
- Total
Hee'World
Spark SQL (PySpark) 본문
- Spark 이전, SQL on Hadoop으로 Hive가 사실상 표준
- DataFrame을 createOrReplaceTempView로 등록하여 SQL 사용 가능
- Grobal TempView
• Spark Session 전역에서 사용 가능하도록 선언
• createOrReplaceTempView는 현재 SparkSession에서만 사용 가능
Spark SQL¶
- Spark DataFrame을 Database Table처럼 사용
In [1]:
import pandas as pd
Pandas 데이터프레임 생성¶
In [5]:
pandf = pd.read_csv("data/Uber-Jan-Feb-FOIL.csv", header=0)
In [6]:
pandf.head()
Out[6]:
Spark session 데이터 프레임 생성¶
In [7]:
uberDF = spark.read.csv("data/Uber-Jan-Feb-FOIL.csv", inferSchema=True, header=True)
spark.read.format("csv").option('').load("data/Uber-Jan-Feb-FOIL.csv")
In [8]:
uberDF.show()
In [9]:
uberDF.createOrReplaceTempView("uber")
Spark SQL SELECT¶
In [13]:
spark_selct = spark.sql("select * from uber limit 10").show()
SELECT column limit¶
In [16]:
spark.sql("select date, dispatching_base_number from uber limit 10").show()
SELECT DISTINCT¶
In [18]:
spark.sql("select distinct dispatching_base_number from uber").show()
WEHRE¶
In [19]:
spark.sql("select count(*) from uber where trips > 2000").show()
distinct, sum, group by, order by¶
In [21]:
spark.sql("""select distinct dispatching_base_number,
sum(trips) tripsum
from uber
group by dispatching_base_number
order by tripsum desc""").show()
In [22]:
spark.sql("""select distinct date,
sum(trips) tripsum
from uber
group by date
order by tripsum desc limit 10""").show()
between¶
In [25]:
spark.sql("select * from uber where trips between 1000 and 2000 limit 10").show()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
'BigData > Spark' 카테고리의 다른 글
Spark ML (Pyspark) (0) | 2020.04.25 |
---|---|
Spark Streaming (PySpark) (0) | 2020.04.21 |
Spark DataFrame 03 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame 02 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame01 (Pyspark) (0) | 2020.04.11 |
Comments