Notice
Recent Posts
Recent Comments
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- Graph Ecosystem
- Federated Learning
- 빅데이터
- Graph Tech
- 그래프
- DeepLearning
- GSQL
- TensorFlow
- graph database
- RDD
- spark
- Cypher
- Python
- 인공지능
- r
- 분산 병렬 처리
- 그래프 에코시스템
- GDB
- 연합학습
- 딥러닝
- Neo4j
- graph
- SQL
- 그래프 데이터베이스
- GraphX
- BigData
- TigerGraph
- 그래프 질의언어
- SparkML
- RStudio
Archives
- Today
- Total
Hee'World
Spark SQL (PySpark) 본문
- Spark 이전, SQL on Hadoop으로 Hive가 사실상 표준
- DataFrame을 createOrReplaceTempView로 등록하여 SQL 사용 가능
- Grobal TempView
• Spark Session 전역에서 사용 가능하도록 선언
• createOrReplaceTempView는 현재 SparkSession에서만 사용 가능
Spark SQL¶
- Spark DataFrame을 Database Table처럼 사용
In [1]:
import pandas as pd
Pandas 데이터프레임 생성¶
In [5]:
pandf = pd.read_csv("data/Uber-Jan-Feb-FOIL.csv", header=0)
In [6]:
pandf.head()
Out[6]:
Spark session 데이터 프레임 생성¶
In [7]:
uberDF = spark.read.csv("data/Uber-Jan-Feb-FOIL.csv", inferSchema=True, header=True)
spark.read.format("csv").option('').load("data/Uber-Jan-Feb-FOIL.csv")
In [8]:
uberDF.show()
In [9]:
uberDF.createOrReplaceTempView("uber")
Spark SQL SELECT¶
In [13]:
spark_selct = spark.sql("select * from uber limit 10").show()
SELECT column limit¶
In [16]:
spark.sql("select date, dispatching_base_number from uber limit 10").show()
SELECT DISTINCT¶
In [18]:
spark.sql("select distinct dispatching_base_number from uber").show()
WEHRE¶
In [19]:
spark.sql("select count(*) from uber where trips > 2000").show()
distinct, sum, group by, order by¶
In [21]:
spark.sql("""select distinct dispatching_base_number,
sum(trips) tripsum
from uber
group by dispatching_base_number
order by tripsum desc""").show()
In [22]:
spark.sql("""select distinct date,
sum(trips) tripsum
from uber
group by date
order by tripsum desc limit 10""").show()
between¶
In [25]:
spark.sql("select * from uber where trips between 1000 and 2000 limit 10").show()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
'BigData > Spark' 카테고리의 다른 글
Spark ML (Pyspark) (0) | 2020.04.25 |
---|---|
Spark Streaming (PySpark) (0) | 2020.04.21 |
Spark DataFrame 03 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame 02 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame01 (Pyspark) (0) | 2020.04.11 |
Comments